[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ranger::pwosf

Title:	PATHWORKS for OSF/1
Notice:	see also NOTED::PWDOSWINV5 (PW client) & TURRIS::DIGITAL_UNIX
Moderator:	CPEEDY::LONG

Created:	Thu Apr 22 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1874
Total number of notes:	6870

1859.0. "5.g eco 2 crash / process disappears" by KERNEL::BURNST () Tue May 20 1997 10:15

    
    Can anybody tell me what can cause a 
    SERVER              3100                05-19-97 04:36PM
    NET3100:   The operation failed because a network software error
    occurred.
    86 17 4C 4D 55 36 30 32 32 3A 20 20 20 49 6E 74        ..LMU6022:  Int
    65 72 6E 61 6C 20 73 65 72 76 65 72 20 74 6F 20        ernal server to
    73 65 72 76 65 72 20 63 6F 6D 6D 75 6E 69 63 61        server communica
    74 69 6F 6E 73 20 66 61 69 6C 75 72 65 2E 0A           tions failure..
    
    
    This occurs on a cust system which will then crash PW without any core
    files and require a server re-start to continue. Cust is running PW OSF
    5.0G eco 2 on DecUNIX 3.2G.
    
    Cust has got debug on and this system has been crashing now under 5.0f
    and 5.0g eco2 since been inclued into a NT domain.
    
    The cust is seeing a lot of 
    NET3100:   The operation failed because a network software error occurred.
    00 00 4C 69 73 74 65 6E 65 72 20 54 61 73 6B 3A        ..Listener Task:
    20 4D 69 73 6D 61 74 63 68 65 64 20 73 65 73 73         Mismatched sess
    69 6F 6E 20 73 65 71 75 65 6E 63 65 20 6E 75 6D        ion sequence num
    62 65 72 73 0A                                         bers.
    
    but I believe that this is normal on a network with NT systems.
    
    Cust has aprox 250 users connected to the system and has maxclients set
    to 400. The debug info is showing t: error 9 and 7 in the debug file.
    
    PS is there any info on what the debug file names mean processes that
    created them.
    
    	Trev UK CSC.

T.R	Title	User	Personal Name	Date	Lines
1859.1		EVTAI1::5dhcp17.evt.dec.com::castello	UNIX PCI Support	`Fri May 23 1997 15:10`	16
	Hi Trev, Could you post your Debug files somewhere in ftp machine. The Debug files i.e. Debug-1029 only say the PID of process If the process crashes no way to know its name only read whithin can help. If the system crash you can retrieve the name and the pid in the crash trace. Have you setup Netbios limit according the maxclient ? Jean-Pierre
1859.2	T_errno = 9 / 7	KERNEL::BURNST		`Tue May 27 1997 11:52`	47
	Jean-Pierre, Here is some further info (I have raised a ipmt as well) The problems have started occuring since the NT doamin the systems were running in have been increased in size to now be the main PC servers for YWS as they are migrating from vaxes running PW v4.2. When first noticed the cust's system was running PW 5.0f eco 2. When this system went down it would send out the goodbye messages to all PC's on the network running winpopup. This would not happen once as expected but has been seen to happen between (worst case) 700 times (average) 200 times that the users would have the message blasted to thier screens. Have now added the BYEMESSAGE=NO to lanman.ini when upgrading to 5.0G eco 2. Cust belives there is a link between these crashes and the adding of a batch of users on the NT PDC. As there have been 4 crashes and these can be traced back to when the IT dept were adding a batch of new users to the domain. Cust is now having users out side prime shift to see if this makes Cust has not yet added any extra users so cant tell the effect yet. Debug file only seemed to give the following as a prob listener.c 385 16:34:13 L0: Likely a rebooted client on lmx.srv 24229 listener.c 408 16:34:13 L0: Send DELETECLIENT msg to server listener.c 427 16:34:13 L0: Wait for client to be deleted... listener.c 436 16:35:32 L0: Got response from lmx.srv listener.c 489 16:35:32 L0: Pass client to existing server process listener.c 757 16:35:32 L0: Connection request on fd 65 listener.c 646 16:35:32 L0: Attempting accept: index 0 net 1, fd 65, acc_fd 2, seq# 352 osftcp.c 1026 16:35:32 L0: t_accept (65, 2) failed, t_errno = 9 osftcp.c 1044 16:35:32 L0: T_LISTEN received on fd 65 listener.c 646 16:35:32 L0: Attempting accept: index 1 net 1, fd 65, acc_fd 8, seq# 175 osftcp.c 1026 16:35:32 L0: t_accept (65, 8) failed, t_errno = 9 osftcp.c 1044 16:35:32 L0: T_LISTEN received on fd 65 listener.c 646 16:35:32 L0: Attempting accept: index 2 net 1, fd 65, acc_fd 9, seq# 439 osftcp.c 1026 16:35:32 L0: t_accept (65, 9) failed, t_errno = 9 and the carryied on giving t_errno =9 / 7 untill restarted as users could nolong work. This is now also raised as a IPMT Trev UK CSC.
1859.3		CPEEDY::HORGAN		`Wed May 28 1997 17:28`	7
	That's good that you opened an IPMT. The server got a pipe error on the pipe that it used to communicate with the lmx.ctrl process. The most likely reason is that the lmx.ctrl process died. You probably don't have enough knb sessions configured because the NT systems are consuming them with browsing sessions. Engineering has been working on improvements in lmx.ctrl for handling alot of simultaneous incoming session requests. Julia