[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ranger::pwosf

Title:PATHWORKS for OSF/1
Notice:see also NOTED::PWDOSWINV5 (PW client) & TURRIS::DIGITAL_UNIX
Moderator:CPEEDY::LONG
Created:Thu Apr 22 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1874
Total number of notes:6870

1859.0. "5.g eco 2 crash / process disappears" by KERNEL::BURNST () Tue May 20 1997 10:15

    
    Can anybody tell me what can cause a 
    SERVER              3100                05-19-97 04:36PM
    NET3100:   The operation failed because a network software error
    occurred.
    86 17 4C 4D 55 36 30 32 32 3A 20 20 20 49 6E 74        ..LMU6022:  Int
    65 72 6E 61 6C 20 73 65 72 76 65 72 20 74 6F 20        ernal server to
    73 65 72 76 65 72 20 63 6F 6D 6D 75 6E 69 63 61        server communica
    74 69 6F 6E 73 20 66 61 69 6C 75 72 65 2E 0A           tions failure..
    
    
    This occurs on a cust system which will then crash PW without any core
    files and require a server re-start to continue. Cust is running PW OSF
    5.0G eco 2 on DecUNIX 3.2G.
    
    Cust has got debug on and this system has been crashing now under 5.0f
    and 5.0g eco2 since been inclued into a NT domain.
    
    The cust is seeing a lot of 
    NET3100:   The operation failed because a network software error occurred.
    00 00 4C 69 73 74 65 6E 65 72 20 54 61 73 6B 3A        ..Listener Task:
    20 4D 69 73 6D 61 74 63 68 65 64 20 73 65 73 73         Mismatched sess
    69 6F 6E 20 73 65 71 75 65 6E 63 65 20 6E 75 6D        ion sequence num
    62 65 72 73 0A                                         bers.
    
    but I believe that this is normal on a network with NT systems.
    
    Cust has aprox 250 users connected to the system and has maxclients set
    to 400. The debug info is showing t: error 9 and 7 in the debug file.
    
    PS is there any info on what the debug file names mean processes that
    created them.
    
    	Trev UK CSC.
T.RTitleUserPersonal
Name
DateLines
1859.1EVTAI1::5dhcp17.evt.dec.com::castelloUNIX PCI SupportFri May 23 1997 15:1016
Hi Trev,

Could you post your Debug files somewhere in ftp machine.

The Debug files i.e. Debug-1029 only say the PID of process
If the process crashes no way to know its name only read 
whithin can help.
If the system crash you can retrieve the name and the pid in
the crash trace.

Have you setup Netbios limit according the maxclient ?


Jean-Pierre 


1859.2T_errno = 9 / 7KERNEL::BURNSTTue May 27 1997 11:5247
    Jean-Pierre,
    
    Here is some further info (I have raised a ipmt as well)
    The problems have started occuring since the NT doamin the systems
    were running in have been increased in size to now be the
    main PC servers for YWS as they are migrating from vaxes running
    PW v4.2.
    When first noticed the cust's system was running PW 5.0f eco 2.
    When this system went down it would send out the goodbye messages
    to all PC's on the network running winpopup. This would not happen
    once as expected but has been seen to happen between (worst case)
    700 times (average) 200 times that the users would have the message
    blasted to thier screens. Have now added the BYEMESSAGE=NO to
    lanman.ini when upgrading to 5.0G eco 2.
    Cust belives there is a link between these crashes and the adding
    of a batch of users on the NT PDC. As there have been 4 crashes and
    these can be traced back to when the IT dept were adding a batch of
    new users to the domain.
    Cust is now having users out side prime shift to see if this makes
    
    Cust has not yet added any extra users so cant tell the effect yet.
     
     Debug file only seemed to give the following as a prob
    listener.c   385 16:34:13 L0: Likely a rebooted client on lmx.srv 24229
    listener.c   408 16:34:13 L0: Send DELETECLIENT msg to server
    listener.c   427 16:34:13 L0: Wait for client to be deleted...
    listener.c   436 16:35:32 L0: Got response from lmx.srv
    listener.c   489 16:35:32 L0: Pass client to existing server process
    listener.c   757 16:35:32 L0: Connection request on fd 65
    listener.c   646 16:35:32 L0: Attempting accept: index 0 net 1, fd 65,
    acc_fd 2, seq# 352
    osftcp.c    1026 16:35:32 L0: t_accept (65, 2) failed, t_errno = 9
    osftcp.c    1044 16:35:32 L0: T_LISTEN received on fd 65
    listener.c   646 16:35:32 L0: Attempting accept: index 1 net 1, fd 65,
    acc_fd 8, seq# 175
    osftcp.c    1026 16:35:32 L0: t_accept (65, 8) failed, t_errno = 9
    osftcp.c    1044 16:35:32 L0: T_LISTEN received on fd 65
    listener.c   646 16:35:32 L0: Attempting accept: index 2 net 1, fd 65,
    acc_fd 9, seq# 439
    osftcp.c    1026 16:35:32 L0: t_accept (65, 9) failed, t_errno = 9
    
    and the carryied on giving t_errno =9 / 7 untill restarted as users
    could nolong work.
    
    This is now also raised as a IPMT
    
    Trev UK CSC.
1859.3CPEEDY::HORGANWed May 28 1997 17:287
That's good that you opened an IPMT.  The server got a pipe error on the pipe that it used to communicate
with the lmx.ctrl process.  The most likely reason is that the lmx.ctrl process died.  You probably don't
have enough knb sessions configured because the NT systems are consuming them with browsing sessions.
Engineering has been working on improvements in lmx.ctrl for handling alot of simultaneous incoming session
requests.  

Julia