[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::dnu_osi

Title:DECnet/OSI for {ULTRIX,OSF/1}
Notice:Indicate version and platform when writing...see #2 for kits
Moderator:BULEAN::CARR
Created:Wed Sep 25 1991
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2187
Total number of notes:10469

2121.0. "dlogind signal send to child when connection close abruptly ?" by PANTER::MARTIN (Be vigilant...) Thu Jan 30 1997 12:40

    What is the signal send by dlogind to it's child when the session
    is abruptly cut at the other end ?
    
    This will help me to narrow down the problem at a customer site.
    They complain that when the dlogin sessions are cut abruptly, the
    application becomes a <defunct> process and eats a lot of CPU...
    
    This behaviour is not observed when the application is run from
    a LAT or a telnet connection, all the processes are properly
    killed.
    Only DECnet connections behave this way !!!
    
    My understanding is that the signal send by telnetd is not the 
    same as the dlogind one when the connections are abruptly closed
    at the other end, and the signal send by dlogind could be set to
    SIG_INT at the application level...whereas the one send by telnetd
    is properly handled.
    
    Thanks in advance for your info,
    
    				============================
    				Alain MARTIN/SSG Switzerland
T.RTitleUserPersonal
Name
DateLines
2121.1UPSAR::WALLACEDigital: A Dilbertian CompanyThu Jan 30 1997 16:0219
    Are we talking ULTRIX or OSF here?  On ULTRIX, dlogind does a
    "kill(0, SIGKILL)".  On OSF, it doesn't send any signal.  However,
    it does do a "revoke()" on the pty, which apparently telnetd 
    doesn't do.
    
    The man page for ps says:
    
    [Digital]  The system puts exiting child processes in the <defunct>
    state if their parent process is still running and has not caught the
    SIGCHLD signal or executed a wait() system call.
    
    I would expect the parent of these problem processes to be some
    shell, not dlogind.  In any event, dlogind does catch SIGCHLD
    signals and does a waitpid() in the handler routine.
    
    Hope this helps.
    
    vince
    
2121.2applic programmingf= bug or init bug with revoke() ?PANTER::MARTINBe vigilant...Fri Jan 31 1997 08:2837
    Hi Vince,
    
    >> Are we talking ULTRIX or OSF here?  
    
    Sorry I forget to mention it was on Digital Unix v3.2x !
    
    >> On OSF, it doesn't send any signal.  However, it does 
    >> do a "revoke()" on the pty, which apparently telnetd
    >> doesn't do.
    
    Interesting !
    
    >> I would expect the parent of these problem processes to be some
    >> shell, not dlogind.  In any event, dlogind does catch SIGCHLD
    >> signals and does a waitpid() in the handler routine.
    
    You are correct, the parent process of the application is a shell
    (I think it's "sh" but I should check...)
    
    But if we know that dlogind does a revoke() on the pty (I don't
    know what exactly it means in term of child-parent relationship)
    how would you explain that the child of the shell (the applic.)
    goes into the <defunct> state if both the dlogind and the parent
    shell process do disappear ?
    
    I would expect the applic to become child of the "init" process
    and be "cleaned" by the "init"'s SIGCHLD signal handler !?
    
    As I mentioned in .0, it does work for LAT and telnet connections:
    all the processes are properly killed.
    So do you think it's an "init" bug with revoke() or could it be related
    to the applic programming ?
    
    Thanks for your help,
    
    					============================
    					Alain MARTIN/SSG Switzerland
2121.3netrix.lkg.dec.com::thomasThe Code WarriorFri Jan 31 1997 12:143
A guess is that the revoke is preventing the SIGHUP from being sent to
the foreground process group.   dlogind should do a kill(0, SIGHUP)
before doing the revoke (and maybe a sleep(1) too.). 
2121.4UPSAR::WALLACEDigital: A Dilbertian CompanyMon Feb 03 1997 19:093
    I'm willing to code up Matt's suggestion if the customer is
    willing to give it a try.  --  Vince
    
2121.5let me know when dlogind is modified...LEMAN::MARTIN_ABe vigilant...Wed Feb 05 1997 06:586
    I can ask the customer to test it if you do modify dlogind...
    
    Let me know when the fix is available.
    
    Cheers,				============================
    					Alain MARTIN/SSG Switzerland
2121.6UPSAR::WALLACEDigital: A Dilbertian CompanyFri Feb 07 1997 13:287
    Hi,
    
    Copy over netrix::test/dlogind.note2121.Z and give it a try.  Don't
    forget dlogind needs to be set uid root.
    
    Vince
    
2121.7modified "dlogind" does not help !PANTER::MARTINBe vigilant...Wed Mar 19 1997 13:2027
    Hi Vince,
    
    We tried the modified dlogind and 8-( the same...
    When the decnet connection close at the remote side, the dlogind
    disappear but the process launched at login time by .profile (a 
    korn shell script doing anything) becomes child of "init" 
    instead of being killed as it's father (dlogind).
    
    We discovered however that changing the user's login shell from
    ksh to csh does change the behaviour. In such a case all the
    descending processes of dlogind process do disappear.
    
    But using the same user account with ksh from telnet or lat connections
    does not show the symptom (all the descending processes of dlogind
    process do disappear too).
    
    Any idea ? 
    
    dlogind or ksh bug ????
    
    Sorry to be so late answering, but customer was not ready to test as
    the workaround we provided (connection through LAT) does work !
    
    Cheers,
    
    
    				Alain
2121.8UPSAR::WALLACEDigital: A Dilbertian CompanyThu Mar 20 1997 19:3714
    Hi,
    
    That's a good clue about the ksh.  We've run into differences between
    shells before.
    
    I can get a backgrounded process to hang around after dlogind exits,
    but it does not run out of control.  It just continues to function
    normally.
    
    Can you get a copy of the .profile file?  And what happens if you
    kill dlogind, rather than breaking the network connection?
    
    Vince
    
2121.9.profile & more info...PANTER::MARTINBe vigilant...Fri Mar 21 1997 07:0278
    Hi Vince,
    
    >> Can you get a copy of the .profile file?  
    
    I just used Digital Unix's template and launched a simple ksh
    script from it at the end (customer's one is running an applic.
    on top of oracle db) :
    
    #
    # *****************************************************************
    # *                                                               *
    # *    Copyright (c) Digital Equipment Corporation, 1991, 1995    *
    # *                                                               *
    # *   All Rights Reserved.  Unpublished rights  reserved  under   *
    # *   the copyright laws of the United States.                    *
    # *                                                               *
    # *   The software contained on this media  is  proprietary  to   *
    # *   and  embodies  the  confidential  technology  of  Digital   *
    # *   Equipment Corporation.  Possession, use,  duplication  or   *
    # *   dissemination of the software and media is authorized only  *
    # *   pursuant to a valid written license from Digital Equipment  *
    # *   Corporation.                                                *
    # *                                                               *
    # *   RESTRICTED RIGHTS LEGEND   Use, duplication, or disclosure  *
    # *   by the U.S. Government is subject to restrictions  as  set  *
    # *   forth in Subparagraph (c)(1)(ii)  of  DFARS  252.227-7013,  *
    # *   or  in  FAR 52.227-19, as applicable.                       *
    # *     
    # *****************************************************************
    #
    # HISTORY
    #
    # @(#)$RCSfile: .profile,v $ $Revision: 4.1.3.4 $ (DEC) $Date:
    1992/09/30 13:49:
    15 $
    #
    PATH=$HOME/bin:${PATH:-/usr/bin:.}
    export PATH
    stty dec
    tset -I -Q
    PS1="`hostname`> "
    MAIL=/usr/spool/mail/$USER
    ./sleep_600.ksh
    
    
    Here is my sleep_600.ksh
    
    #!/bin/ksh
    echo "Sleeping for 10 min."
    sleep 600
    
    >> And what happens if you kill dlogind, rather than breaking 
    >> the network connection?
    
    Exactly the same. When ksh is the user's login shell, ksh becomes
    child of init process instead of being killed by signal 9 (KILL).
    As if the ksh login shell hasn't received the SIGKILL signal.
    So the ksh process survives, all it's descending processes do 
    also survive.
    
    When we either break the connection at the remote side (VMS machine)
    for a csh login shell or we kill (SIGKILL) the dlogind manually , all 
    the descending processes of dlogind are killed. 
    That's what we expect !
    
    We tested your modified dlogind (netrix::test/dlogind.note2121.Z)
    on both v3.2c and v3.2g, but we haven't noticed any difference.
    
    
    Thanks again for your help, do you think it's time for IPMT
    (I understand you cannot work too long on unofficial requests,
    so we do...), but I'd like to know better where the problem is
    before filling up the IPMT form (dlogind or ksh bug???) ?
    
    Cheers,
    
    				Alain
    
2121.10UPSAR::WALLACEDigital: A Dilbertian CompanyFri Mar 21 1997 16:326
    I think it is probably time to open an IPMT case.  Since things
    seem to work OK with lat & telnet, open the case against DECnet.
    If it turns out to be ksh after all we'll forward it up to OSG.
    
    Vince
    
2121.11Ok, I'll fill up an IPMT form against DECnet.PANTER::MARTINBe vigilant...Tue Mar 25 1997 06:3210
    Hi Vince,
    
    I'll open an IPMT then against DECnet, but will wait until the
    onsite engineer comes back from holiday (next week) to collect
    all the necessary info.
    
    Thanks for your help.
    
    Alain