[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference bulova::decw_jan-89_to_nov-90

Title:	DECWINDOWS 26-JAN-89 to 29-NOV-90
Notice:	See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit
Moderator:	STAR::VATNE

Created:	Mon Oct 30 1989
Last Modified:	Mon Dec 31 1990
Last Successful Update:	Fri Jun 06 1997
Number of topics:	3726
Total number of notes:	19516

2059.0. "V2 Performance problem" by NITMOI::PESENTI (Only messages can be dragged) Wed Jan 17 1990 16:18

A bit of history first:

Our group has a bunch of VS2000, 2 RD54s, 1 TK50, 6 MB.
Our cluster has an 8800, and 8650, and some other assorted systems.  
Most people run DECwindows applications remotely with the process running on
the 8800.

We were running VMS V5.2 on the workstations, and VMS V5.1 on the cluster.  Then
the worm invaded, and everything was REBUILT from scratch with VMS V5.3.  

So a lot of things have changed.

Since we have started using DECwindows V2, we notice a consistent problem on the
remote applications.  Every now and again, all remote applications on a work-
station freeze for a few seconds, then proceed.  I don't know if it happens
simultaneously on all workstations, but it certainly hits all applications for
a given workstation.  The local applications are not affected this way.  It is
almost as if the 8800 is heavily loaded, but it isn't.  LAT processes on the 
8800 don't stop and go like this.

According to our system manager, there is not too much traffic on the net when
the problems occur.  He believes it is the amount of memory on the VS2000.  I 
disagree because I can hear my system when it starts paging, and it isn't.  I
think it might be some resource required by some network software.  

Any ideas on what to look at, or similar experiences, or anything?

							Thanks
							   -JP

T.R	Title	User	Personal Name	Date	Lines
2059.1	More information...	NITMOI::PESENTI	Only messages can be dragged	`Wed Jan 17 1990 17:30`	11
	To be more specific: On the work station, there are 2 windows. One is a remote DECterm, running on the 8800. The other is a local DECterm running VWSLAT, connected to the 8800. The same directory command is kicked off in each window at the same time. The remote DECterm window has the stop and go behavior, the VWSLAT window does not. Curiouser and curiouser! -JP
2059.2	Ditot	DSTEG::HOSSFELD	I'm so confused!	`Thu Jan 18 1990 12:18`	6
	I have a simular setup running vms 5.1 on the remote system and vms 5.3 on the local system (3200). I run several applications from the remote system to my 3200. I am experiencing the same delays and in a couple of cases I have seen minutes go by. Paul H.
2059.3	pipeline quota too large problem again?	STAR::BMATTHEWS		`Thu Jan 18 1990 13:32`	3
	Do you also have the problems when the app runs locally? What is your DECNET pipeline quota set to? Bill
2059.4		DSTEG::HOSSFELD	I'm so confused!	`Thu Jan 18 1990 18:40`	5
	I don't appear to have any problems locally. The pipe quo on both systems is 10000. Paul H.
2059.5	NETACP?	OAHU::BEERMAN	Charlie Beerman	`Fri Jan 19 1990 11:22`	20
	I had this same problem on my standalone VSII under VMS 5.1/DW 1.0 and now on my LAVC of several VS3100's under VMS 5.3/DW 2.0. I noticed that during the delay, the system was page faulting like crazy. It was the NETACP process that was faulting. The system managers at my facility told me I needed to define logicals for NETACP$EXTENT NETACP$MAXIMUM_WORKING_SET NETACP$PAGE_FILE to reasonable numbers. (The defaults are way too low.) What are "reasonable numbers" depends partly on how much memory you have. If you give everything to NETACP your user processes and applications will fault a lot. My VS3100's all have 16MB memory, and I have set the above logicals to 6000, 6000, and 20000 respectively. NETACP still faults but not as much as before. If I raise the numbers even further I have hardly any memory left for users and applications.
2059.6		STAR::MCLEMAN	Hey!!! This Stall is Taken!!	`Fri Jan 19 1990 11:37`	11
	Something to look at: Is the service turned on your circuit? (Boot requests) This will cause netacp to go nuts on an ethernet with constant boot requests. (Especially from LPS40s, VAX satellites, ect.) Do a $ MC NCP SHO CIR xxx-0 CHAR (where xxx-0 is your circuit type) Just a suggestion, Jeff
2059.7	Could it be the faulting on the cluster?	NITMOI::PESENTI	Only messages can be dragged	`Fri Jan 19 1990 11:53`	22
	Local applications work fine. The pipeline quota on my VS2000 is 10000. The pipeline quota on the 8800 is 32767. Here is what the systems display for NETACP: VAX/VMS V5.3 on node HOONOO 19-JAN-1990 08:29:31.96 Uptime 4 00:53:03 Pid Process Name State Pri I/O CPU Page flts Ph.Mem 0000004E NETACP HIB 10 328 0 00:02:12.17 309 344 VAX/VMS V5.3 on node KISMIF 19-JAN-1990 08:37:10.62 Uptime 35 19:39:27 Pid Process Name State Pri I/O CPU Page flts Ph.Mem 21400128 NETACP HIB 10 138315 0 10:01:59.41 49217969 1500 I'm not a performance expert, but something tells me 49 million page faults is a little much even if it has been up for a month. Some arithmetic shows this to be about 16 faults per second, on average. Probably hitting peaks a LOT higher than that. I think it is time to twiddle the NETACP$* logicals.
2059.8	how to reduce pagefaulting w/ service enabled	DEMON3::CLEVELAND	Notes - fun or satanic cult?	`Fri Jan 19 1990 12:58`	8
	That's pretty high all right. I'll bet the 8800 has service enabled to boot the workstations? You can reduce the load if you define the most frequent boot-requestors in area 0 (i.e., no area). Even if you don't have the boot file for these nodes you will reduce the page faulting. Tim ps> Hi JP
2059.9	pipline quota too high on the 8800?	STAR::BMATTHEWS		`Mon Jan 22 1990 11:33`	4
	It may be that the pipeline quota is too high on the 8800. Does anyone remember what values of pipeline quota are too high? Was it anything larger than 64k-576 or 32k-576??? Bill
2059.10		STAR::MFOLEY	Rebel Without a Clue	`Mon Jan 22 1990 12:15`	5
	I think anything over 32767 is too high for pipeline quota.. mike
2059.11	64K-576	IOUONE::BRYSON		`Tue Jan 23 1990 18:25`	5
	Under V1.0, the pipeline quota could not be over 64K-576 or performance would suffer. David
2059.12		QUARK::LIONEL	Free advice is worth every cent	`Tue Jan 23 1990 20:14`	4
	Boy do I remember the pipeline quota problem, but 32767 would not hit it. The magic value is indeed 65535-576. Anything more than 25000 is pointless. Steve
2059.13	JP isn't the only one with this problem...	WAYLAY::GORDON	It's always the freakin' dots...	`Wed Jan 24 1990 21:07`	24
	Just for the record, I'm seeing the same problems as JP in the base note. In fact, JP came over and asked me about it. I have two boot nodes of our 20-node MiVc (one 8700, one 8800) on which I start 4 terminals (2 each node) via CREATE/TERMINAL back to my standalone VS3540. (Just so no one thinks it's performance problems on the server side...) I remembered Steve Lionel's problem, and went and looked through this conference and found his original note. I dropped the pipeline quota to 25000 on both nodes (from 32767) and have seen no improvement. (WAYLAY has the standard 10,000 for pipeline quota.) Both the cluster and firefox are running vanilla VMS V5.3. I don't seem to have the problems with NETACP that JP has in the base note. Both Nautili have 96 meg and we give NETACP a generous working set. We do, however, have a lot of links as we host several popular Notes conferences. Max links is 180, max alias links is 64. The 8800 is our cluster router. I, too, would be greatful for any suggestions. --Doug
2059.14	Causing pain at a customer site! Ideas?	NCBDVX::HOHMD	Sacred cows make great steaks	`Tue Feb 06 1990 20:30`	138
	Has anyone had any luck with this? I'm seeing it on a customer site and it is really bad! 6-10 second delays every 30 seconds or so. It only happens on remote apps (like DECterm and LSE) for machines that are booted into the cluster and stand-alone machines (VS3100's) The machines the remote apps are running on are 8650's and a 6460 (there are 5 8650's, 2 6360's, 1 8840, 1 6460 and 8 vs3100's and almost all terminal and printer traffic is over LAT) The pipeline quota on all the big machines has been set to 20000 and the line receive buffers have been set to 24 (a hunch from recommendations from DFS) -- Neither of these changes have had an effect. They have put a "Sniffer" on the ethernet and have not seen excessive traffic (5-10% utilization) and not and excess of collisions, but they do occur. Any ideas? What about PQL sysgen quotas for process creation? Following the form feed are the executor and line characteristics from both the 6460 and one of the 3100's followed by the PQL quotas from the 6460. thanks in advance- in the trenches, -dale Here are the Executor characteristics from the 6460: ______________________________________________________________________________ Executor node = 1.10 (RES3) Identification = RES3 VAX IN JOHNSTON, IOWA Management version = V4.0.0 Incoming timer = 60 Outgoing timer = 60 Incoming Proxy = Enabled Outgoing Proxy = Enabled NSP version = V4.1.0 Maximum links = 96 Delay factor = 80 Delay weight = 5 Inactivity timer = 60 Retransmit factor = 10 Routing version = V2.0.0 Type = routing IV Routing timer = 600 Broadcast routing timer = 180 Maximum address = 100 Maximum circuits = 48 Maximum cost = 200 Maximum hops = 10 Maximum visits = 20 Maximum area = 63 Max broadcast nonrouters = 64 Max broadcast routers = 32 Maximum path splits = 1 Area maximum cost = 1022 Area maximum hops = 30 Maximum buffers = 100 Buffer size = 576 Default access = incoming and outgoing Pipeline quota = 20000 Alias maximum links = 32 Alias node = 1.6 (PDS) Path split policy = Normal Maximum Declared Objects = 31 ______________________________________________________________________________ Here are line characteristics from the 6460: ______________________________________________________________________________ Line = BNA-0 Counter timer = 43200 Receive buffers = 24 Controller = normal Protocol = Ethernet Service timer = 4000 Hardware address = 08-00-2B-11-80-DA Device buffer size = 1498 ______________________________________________________________________________ Here are executor characteristics from one of the 3100's: ______________________________________________________________________________ Executor node = 1.85 (DECVS1) Identification = DECnet-VAX V5.3, VMS V5.3 Management version = V4.0.0 Incoming timer = 60 Outgoing timer = 60 Incoming Proxy = Enabled Outgoing Proxy = Enabled NSP version = V4.1.0 Maximum links = 32 Delay factor = 80 Delay weight = 5 Inactivity timer = 60 Retransmit factor = 10 Routing version = V2.0.0 Type = nonrouting IV Routing timer = 600 Broadcast routing timer = 180 Maximum address = 1023 Maximum circuits = 16 Maximum cost = 1022 Maximum hops = 30 Maximum visits = 63 Maximum area = 63 Max broadcast nonrouters = 64 Max broadcast routers = 32 Maximum path splits = 1 Area maximum cost = 1022 Area maximum hops = 30 Maximum buffers = 100 Buffer size = 576 Nonprivileged user id = DECNET Nonprivileged password = CUFOACTSAM Default access = incoming and outgoing Pipeline quota = 10000 Alias maximum links = 32 Path split policy = Normal Maximum Declared Objects = 31 ______________________________________________________________________________ And finally, line characteristics from the 3100's ______________________________________________________________________________ Line = SVA-0 Receive buffers = 24 Controller = normal Protocol = Ethernet Service timer = 4000 Hardware address = 08-00-2B-13-07-66 Device buffer size = 1498 ______________________________________________________________________________
2059.15	Same problem here, too!	ASD::LOW	Member - American Autobahn Society	`Wed Feb 07 1990 13:38`	14
	Are there any old MVII's on that net? We're having the same problem here. Plenty of CPU, relatively low Ethernet usage, neither the WS or the main CPUs paging excessively, the NETACP processes have only a few hundred page faults, and yet we still get periodic "hangs" running remote applications. I was guessing that the DEQNA on some of the MVII's we have was causing a problem, but that may not be the case... I too have no clue why this is happening... Any additional ideas would be appreciated! (I'm in ZK) Dave
2059.16	Could the DEQNA be the problem?	AIRBAG::SWATKO	Electrons are cheap. Trees are not.	`Thu Feb 08 1990 15:58`	10
	> I was guessing that the DEQNA on some of the MVII's we have was causing > a problem, but that may not be the case... I too have no clue why this > is happening... If I remember correctly (and I may be way off base on this one), ther was something about DEQNAs that caused problems. A while back, we were all told to upgrade our DEQNAs to DELQAs. Maybe that has something to do with it. Anyone know anything about this? -Mike
2059.17		DECWIN::JMSYNGE	James M Synge, VMS Development	`Mon Feb 12 1990 13:31`	7
	I would recommend looking at the activity on the client nodes. Particularly at NETACP. It maybe that some activity is occurring which is causing NETACP to fail to maintain the connections. Try a MONITOR PROC/TOPCPU or MONITOR DECNET. James
2059.18	Hmmm...router problem.	ASD::LOW	Member - American Autobahn Society	`Tue Feb 13 1990 17:20`	14
	Re: -.1 I've done that - nothing seems out of the ordinary - 30-60% CPU utilization - with NETACP not even breaking into the double digits percentage wise... Looking at DECNET shows packet I/O rates of 20-40...well below the capacity of our Ethernet controller... However, I have noticed that this phenomenon seems to occur only on our cluster router (a level IV router). Might this be the cause? Dave
2059.19		DECWIN::JMSYNGE	James M Synge, VMS Development	`Wed Feb 14 1990 15:49`	7
	Dave, I'd recommend running MONITOR in record mode on the cluster router. Then check the data it has recorded around the time of the next connection abort event. I suspect you'll see high activity levels. James
2059.20	update on CREATE/TERM delays over DECnet?	CSC32::FORSMAN	Ginny Forsman 522-4731 CSC/CS	`Fri Apr 27 1990 22:53`	27
	Has anyone come up resolution for this problem? My customer is seeing same behavior as .0. VS3100, 16 mg, local page/swap disk, booting off a 6430, 96 mg, 5.3. I have not been able to duplicate this here. From the client, SET DISPLAY/CREATE/NODE=vs3100 CREATE/DET/TERM and in the decterm they run WPS, or ALlin1, or EDT. Find a 10-20 second delay every minute or so. They can also duplicate this running DECW$EXAMPLES:ICO. Their NETACP logicals are not defined. Did a sho sys, and the NETACP process had only 246 pgflts. Again, all works fine using local transport. All help appreciated. Thanks, Ginny Forsman CSC/CS (not a decnet person, sorry...)
2059.21		PSW::WINALSKI	Careful with that VAX, Eugene	`Sat Apr 28 1990 21:02`	9
	RE: .20 The delays sound like a network timeout and then retransmission of a lost data packet. Go into NCP on both machines and do SHOW CIRCUIT COUNT for the circuit being used, SHOW LINE COUNT for the line being used, and SHOW NODE COUNT for the nodes involved. Look for non-zero timeout, buffer unavailable, and other such errors. This definitely sounds like a network problem. --PSW