[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

84.0. "Is this a Bug or is it me ?" by PILOU::BONGARTZ (Huckleberry Finn, I presume ?) Tue Mar 27 1990 05:56

T.R	Title	User	Personal Name	Date	Lines
84.1	INFO - are you running EFT kit?	GOSTE::CALLANDER		`Tue Mar 27 1990 18:49`	10
	Hi, You have hit upon some of the problems that we are currently working on. I would be interested in knowing if you are running the EFT kit. Especially the component version numbers of the DECnet NODE4 Access Module, the TRM Presentation Module, and the base system. Thanks for the additional information.
84.2	All T1.0.0 ...	PILOU::BONGARTZ	Huckleberry Finn, I presume ?	`Wed Mar 28 1990 09:55`	7
	> kit. Especially the component version numbers of the DECnet NODE4 > Access Module, the TRM Presentation Module, and the base system. All three Component Versions are T1.0.0 ... ( my workaround now is to exit and re-run mcc if it takes more than 45 seconds for a poll... )
84.3	if you find another goods ones...	GOSTE::CALLANDER		`Wed Mar 28 1990 20:27`	12
	Thanks for the additional input. We will see what can be done. If you hit any other commands that go up at such a nice rate it would be useful if you posted them here. Since different commands go through different paths in the system, sometimes something that looks like a small leak on one command, turns out to be something major given another command. jill
84.4	got one! (or two?)	PILOU::BONGARTZ	Huckleberry Finn, I presume ?	`Fri Mar 30 1990 09:14`	28
	> -< if you find another goods ones... >- Got another one... in my original polling loop, I also checked the counters on the local node (GABIN). Each poll created a SERVER_xxxx process, which apparently terminated after ca 5 minutes... but as the commands were given in less time than that,the system filled up with these processes... and ended up doing nothing but paging and swapping. Another thing, though it might not be due to me, MCC or whatever else - "just a coincidence ?" : I started my poll server in the afternoon before leaving work, and left it running over night, polling all the routers here in Valbonne. During the night, the whole network went down - systems crashed, etc. The last output from my server was at 03:13, and about that time the problems occured. Wether my code crashed because of the problem, or the problem occured because of the polls, is not clear to me - but if it's due to MCC or my server (no privs!), we'd better make sure this doesn't happen on a customer network.. I'll let the thing run tonight and let you know if the net goes down the drain again. Regards, Marc.
84.5	Thanks for the additional information	PETE::BURGESS		`Fri Mar 30 1990 13:55`	36
	You have presented several problems to us which have been assigned to different engineers for resolution. 1) The reserved operand fault which occurs when MCC is executed as a sub-process assigning sys$input/output to mail-boxes. This seems like a contained problem- I will try to reproduce your experiment here and diagnosis the problem: Would you send me the exact commands which you used to create the mcc sub-process and the commands used for communicating with the sub-process? (enet: Pete::Burgess) 2) Virtual memory expansion. This is probably due to "vm leaks". We have instrumented test versions of MCC with diagnostic tools for recording vm deallocation problems, and have been testing this problem since December, and have fixed many problems. Our focus has probably been the on the normal successful operations, and the most common error paths. My hypothesis is that MCC is taking some error paths without properly terminating its requested operations. We will be trying to reproduce this problem with our instrumented version of MCC. 3) The performance problems: The DECnet phase 4 project leader will be contacting you to obtain more diagnostic information. My first concerns relate to the large number of nml servers which are being created on your routing servers \Pete Burgess
84.6	Reduce NETSERVER$TIMEOUT to dump processes	TOOK::CAREY		`Fri Mar 30 1990 15:25`	26
	The only way we can see MCC "bringing down the network" is by applying huge loads on all of the routing nodes in the network. If we put enough pressure on them in terms of excessive NETSERVERs, it is conceivable that they will be unable to perform normal network communications. As soon as that happens, the routing traffic increases dramatically because the routers are trying to understand the topology. If you've got an appreciable number of routers, the network degrades rapidly. So, the first thing to do is get rid of the excessive NETSERVERS. We don't know why you spawn a new server with each connection. But until we do, you can at least cut down on the number of server processes that are out there by setting the NETSERVER process timeout lower. Do this by setting the system logical NETSERVER$TIMEOUT to just a few seconds instead of the default of around five minutes. You'll still suffer the process creation overhead, but at least you won't get the swapping and paging that you're seeing. Hope this helps, and I'll give you more on this server problem as soon as I can find out more. -Jim Carey
84.7	We Can't Reproduce Multiple Server Problems	TOOK::CAREY		`Mon Apr 02 1990 16:03`	61
	Marc, I had a chance to do some experimenting on our network here, and was unable to reproduce a situation where multiple servers were spawned and weren't expected. Any details that you could give me about the exact nature of your requests could help, although I can't imagine what might be different about them. I created and checked out the following cases: - Connecting to a remote node with Proxy Access defined. This worked fine. Subsequent requests connected to the spawned server. - Connecting to a remote node using explicit access (BY USER = "...") This also worked fine. I did these close together, so the Proxy Server was still out there, and a new server was created for the explicit access case. This is normal because VMS has to consider them to be different processes with different rights. As expected, subsequent requests connected to the same server just spawned. - Connecting to a remote node using Default Access (no proxy, no explicit accounting information) This worked as expected too. After forming this connection, I had three servers running: one for the Proxy access, one for the Explicit Access, and one for the Default Access. Subsequent requests didn't spawn any new servers. In fact, once I had the three servers running, I attempted to confuse the system by using Proxy, Explicit, and Default Access in different combinations. No problems were encountered, and no additional processes were spawned (by the way, connecting to an existing server cuts down the response to a circuit counters request from an estimated fifteen seconds, to two or three seconds maximum). We also tried to reproduce the problem on a boundary condition. You mentioned that your servers were set up to last about five minutes and that you were requesting counters about every five minutes. We wondered if the server process could somehow get locked up if a request came in just as it was being stopped. Several attempts to cause this to happen were unsuccessful. Since you appear to reproduce this problem at will, we don't expect that the problem lies on that boundary. We still suspect that there is something funny about the NETSERVER processes that you are creating and will continue to pursue that angle. I hope that isolating and changing the appropriate network, system, or account parameters will clean up these servers and get your connections behaving more closely to what we expect. -Jim Carey
84.8	Defective Bridge responsible for Network problems	TOOK::CAREY		`Tue Apr 03 1990 14:20`	11
	Just a little added detail: While MCC was under suspicion of "bringing down the network" it appears that a defective bridge was the real culprit this time. We are still investigating the problems described in this note, but there is no grounds to fear that DECmcc will topple your network. -Jim Carey
84.9	Any progress on increasing response time problem?	DSTEG1::MCCANN		`Wed May 09 1990 13:40`	6
	Has the problem of the ever-increasing response times mentioned in .0 been solved, or its cause identified? If so, will it be fixed in EFT update? Jack
84.10	leaks being plugged	GOSTE::CALLANDER		`Wed May 09 1990 20:10`	24
	There were two things at work in the problems reported. The defective bridge was the cause of the crash and most of the "slow down" that was experienced. The other problem was due to some memory leaks (causing fragmentation of memory when run for extended periods of time), and the dictionary lookup overhead. For EFT update we have made quite a few advances in our memory management by implementing a local cache for the allocation and deallocation of temporary memory; a better caching alogrithym for the dictionary look ups was implmented in the EFT release, and fine tuned for EFT update; quite a few leaks were plugged; and some of the slower code paths have been reviewed and condensed to provide a faster end user response time. So far people with early, integration, releases of the base system changes have been very pleased with the enhancements. I hope you are too. But we are not stopping there, work on performance and memory management are continuing. jill