[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference bulova::decw_jan-89_to_nov-90

Title:DECWINDOWS 26-JAN-89 to 29-NOV-90
Notice:See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit
Moderator:STAR::VATNE
Created:Mon Oct 30 1989
Last Modified:Mon Dec 31 1990
Last Successful Update:Fri Jun 06 1997
Number of topics:3726
Total number of notes:19516

2059.0. "V2 Performance problem" by NITMOI::PESENTI (Only messages can be dragged) Wed Jan 17 1990 16:18

A bit of history first:

Our group has a bunch of VS2000, 2 RD54s, 1 TK50, 6 MB.
Our cluster has an 8800, and 8650, and some other assorted systems.  
Most people run DECwindows applications remotely with the process running on
the 8800.

We were running VMS V5.2 on the workstations, and VMS V5.1 on the cluster.  Then
the worm invaded, and everything was REBUILT from scratch with VMS V5.3.  

So a lot of things have changed.

Since we have started using DECwindows V2, we notice a consistent problem on the
remote applications.  Every now and again, all remote applications on a work-
station freeze for a few seconds, then proceed.  I don't know if it happens
simultaneously on all workstations, but it certainly hits all applications for
a given workstation.  The local applications are not affected this way.  It is
almost as if the 8800 is heavily loaded, but it isn't.  LAT processes on the 
8800 don't stop and go like this.

According to our system manager, there is not too much traffic on the net when
the problems occur.  He believes it is the amount of memory on the VS2000.  I 
disagree because I can hear my system when it starts paging, and it isn't.  I
think it might be some resource required by some network software.  

Any ideas on what to look at, or similar experiences, or anything?

							Thanks
							   -JP 
T.RTitleUserPersonal
Name
DateLines
2059.1More information...NITMOI::PESENTIOnly messages can be draggedWed Jan 17 1990 17:3011
To be more specific:

On the work station, there are 2 windows.  One is a remote DECterm, running on
the 8800.  The other is a local DECterm running VWSLAT, connected to the 8800.

The same directory command is kicked off in each window at the same time.  The
remote DECterm window has the stop and go behavior, the VWSLAT window does not.

Curiouser and curiouser!

					-JP
2059.2DitotDSTEG::HOSSFELDI'm so confused!Thu Jan 18 1990 12:186
I have a simular setup running vms 5.1 on the remote system and vms 5.3 on the 
local system (3200). I run several applications from the remote system to my
3200. I am experiencing the same delays and in a couple of cases I have seen
minutes go by.

Paul H.
2059.3pipeline quota too large problem again?STAR::BMATTHEWSThu Jan 18 1990 13:323
Do you also have the problems when the app runs locally?
What is your DECNET pipeline quota set to?
					Bill
2059.4DSTEG::HOSSFELDI'm so confused!Thu Jan 18 1990 18:405
I don't appear to have any problems locally. 

The pipe quo on both systems is 10000.

Paul H.
2059.5NETACP?OAHU::BEERMANCharlie BeermanFri Jan 19 1990 11:2220
   I had this same problem on my standalone VSII under VMS 5.1/DW 1.0
   and now on my LAVC of several VS3100's under VMS 5.3/DW 2.0.

   I noticed that during the delay, the system was page faulting like
   crazy.  It was the NETACP process that was faulting.  The system
   managers at my facility told me I needed to define logicals for 

   NETACP$EXTENT
   NETACP$MAXIMUM_WORKING_SET
   NETACP$PAGE_FILE

   to reasonable numbers. (The defaults are way too low.)  What are
   "reasonable numbers" depends partly on how much memory you have.  If
   you give everything to NETACP your user processes and applications
   will fault a lot.

   My VS3100's all have 16MB memory, and I have set the above logicals
   to 6000, 6000, and 20000 respectively.  NETACP still faults but not
   as much as before.  If I raise the numbers even further I have hardly
   any memory left for users and applications.
2059.6STAR::MCLEMANHey!!! This Stall is Taken!!Fri Jan 19 1990 11:3711
Something to look at:

 Is the service turned on your circuit? (Boot requests)
	This will cause netacp to go nuts on an ethernet with constant boot
	requests. (Especially from LPS40s, VAX satellites, ect.)

Do a $ MC NCP SHO CIR xxx-0 CHAR  (where xxx-0 is your circuit type)

Just a suggestion,

Jeff
2059.7Could it be the faulting on the cluster?NITMOI::PESENTIOnly messages can be draggedFri Jan 19 1990 11:5322
Local applications work fine.

The pipeline quota on my VS2000 is 10000.

The pipeline quota on the 8800 is 32767.

Here is what the systems display for NETACP:

VAX/VMS V5.3  on node HOONOO  19-JAN-1990 08:29:31.96   Uptime  4 00:53:03
  Pid    Process Name    State  Pri      I/O       CPU       Page flts Ph.Mem
0000004E NETACP          HIB     10      328   0 00:02:12.17       309    344

VAX/VMS V5.3  on node KISMIF  19-JAN-1990 08:37:10.62   Uptime 35 19:39:27
  Pid    Process Name    State  Pri      I/O       CPU       Page flts Ph.Mem
21400128 NETACP          HIB     10   138315   0 10:01:59.41  49217969   1500

I'm not a performance expert, but something tells me 49 million page faults is
a little much even if it has been up for a month.  Some arithmetic shows this
to be about 16 faults per second, on average.  Probably hitting peaks a LOT 
higher than that.

I think it is time to twiddle the NETACP$* logicals.
2059.8how to reduce pagefaulting w/ service enabledDEMON3::CLEVELANDNotes - fun or satanic cult?Fri Jan 19 1990 12:588
That's pretty high all right.  I'll bet the 8800 has service enabled to boot
the workstations?  You can reduce the load if you define the most frequent
boot-requestors in area 0 (i.e., no area).  Even if you don't have the boot
file for these nodes you will reduce the page faulting.

Tim

ps> Hi JP
2059.9pipline quota too high on the 8800?STAR::BMATTHEWSMon Jan 22 1990 11:334
It may be that the pipeline quota is too high on the 8800. Does anyone
remember what values of pipeline quota are too high? Was it anything larger
than 64k-576 or 32k-576???
							Bill
2059.10STAR::MFOLEYRebel Without a ClueMon Jan 22 1990 12:155

	I think anything over 32767 is too high for pipeline quota.. 

							mike
2059.1164K-576IOUONE::BRYSONTue Jan 23 1990 18:255
Under V1.0, the pipeline quota could not be over 64K-576 or performance would
suffer.

David
2059.12QUARK::LIONELFree advice is worth every centTue Jan 23 1990 20:144
Boy do I remember the pipeline quota problem, but 32767 would not hit it.
The magic value is indeed 65535-576.  Anything more than 25000 is pointless.

				Steve
2059.13JP isn't the only one with this problem...WAYLAY::GORDONIt's always the freakin' dots...Wed Jan 24 1990 21:0724
	Just for the record, I'm seeing the same problems as JP in the base
note.  In fact, JP came over and asked me about it.

	I have two boot nodes of our 20-node MiVc (one 8700, one 8800) on 
which I start 4 terminals (2 each node) via CREATE/TERMINAL back to my 
standalone VS3540. (Just so no one thinks it's performance problems on 
the server side...)

	I remembered Steve Lionel's problem, and went and looked through this
conference and found his original note.  I dropped the pipeline quota to
25000 on both nodes (from 32767) and have seen no improvement.  (WAYLAY has
the standard 10,000 for pipeline quota.)  Both the cluster and firefox are
running vanilla VMS V5.3.

	I don't seem to have the problems with NETACP that JP has in the
base note.  Both Nautili have 96 meg and we give NETACP a generous working
set.  We do, however, have a *lot* of links as we host several popular
Notes conferences.  Max links is 180, max alias links is 64. The 8800 is
our cluster router.

	I, too, would be greatful for any suggestions.


					--Doug 
2059.14Causing pain at a customer site! Ideas?NCBDVX::HOHMDSacred cows make great steaksTue Feb 06 1990 20:30138
Has anyone had any luck with this?

I'm seeing it on a customer site and it is really bad!  6-10 second delays
every 30 seconds or so.

It only happens on remote apps (like DECterm and LSE) for machines that are
booted into the cluster and stand-alone machines (VS3100's)

The machines the remote apps are running on are 8650's and a 6460 (there are
5 8650's, 2 6360's, 1 8840, 1 6460 and 8 vs3100's and almost all terminal and
printer traffic is over LAT)

The pipeline quota on all the big machines has been set to 20000 and the
line receive buffers have been set to 24 (a hunch from recommendations from
DFS) -- Neither of these changes have had an effect.

They have put a "Sniffer" on the ethernet and have not seen excessive traffic
(5-10% utilization) and not and excess of collisions, but they do occur.

Any ideas? What about PQL sysgen quotas for process creation?

Following the form feed are the executor and line characteristics from both
the 6460 and one of the 3100's followed by the PQL quotas from the 6460.

thanks in advance-

in the trenches,

-dale

Here are the Executor characteristics from the 6460:
______________________________________________________________________________
Executor node = 1.10 (RES3)

Identification           = RES3 VAX IN JOHNSTON, IOWA
Management version       = V4.0.0
Incoming timer           = 60
Outgoing timer           = 60
Incoming Proxy           = Enabled
Outgoing Proxy           = Enabled
NSP version              = V4.1.0
Maximum links            = 96
Delay factor             = 80
Delay weight             = 5
Inactivity timer         = 60
Retransmit factor        = 10
Routing version          = V2.0.0
Type                     = routing IV
Routing timer            = 600
Broadcast routing timer  = 180
Maximum address          = 100
Maximum circuits         = 48
Maximum cost             = 200
Maximum hops             = 10
Maximum visits           = 20
Maximum area             = 63
Max broadcast nonrouters = 64
Max broadcast routers    = 32
Maximum path splits      = 1
Area maximum cost        = 1022
Area maximum hops        = 30
Maximum buffers          = 100
Buffer size              = 576
Default access           = incoming and outgoing
Pipeline quota           = 20000
Alias maximum links      = 32
Alias node               =  1.6 (PDS)
Path split policy        = Normal
Maximum Declared Objects = 31
______________________________________________________________________________

Here are line characteristics from the 6460:
______________________________________________________________________________

Line = BNA-0

Counter timer            = 43200
Receive buffers          = 24
Controller               = normal
Protocol                 = Ethernet
Service timer            = 4000
Hardware address         = 08-00-2B-11-80-DA
Device buffer size       = 1498
______________________________________________________________________________

Here are executor characteristics from one of the 3100's:
______________________________________________________________________________
Executor node = 1.85 (DECVS1)

Identification           = DECnet-VAX V5.3,  VMS V5.3
Management version       = V4.0.0
Incoming timer           = 60
Outgoing timer           = 60
Incoming Proxy           = Enabled
Outgoing Proxy           = Enabled
NSP version              = V4.1.0
Maximum links            = 32
Delay factor             = 80
Delay weight             = 5
Inactivity timer         = 60
Retransmit factor        = 10
Routing version          = V2.0.0
Type                     = nonrouting IV
Routing timer            = 600
Broadcast routing timer  = 180
Maximum address          = 1023
Maximum circuits         = 16
Maximum cost             = 1022
Maximum hops             = 30
Maximum visits           = 63
Maximum area             = 63
Max broadcast nonrouters = 64
Max broadcast routers    = 32
Maximum path splits      = 1
Area maximum cost        = 1022
Area maximum hops        = 30
Maximum buffers          = 100
Buffer size              = 576
Nonprivileged user id    = DECNET
Nonprivileged password   = CUFOACTSAM
Default access           = incoming and outgoing
Pipeline quota           = 10000
Alias maximum links      = 32
Path split policy        = Normal
Maximum Declared Objects = 31
______________________________________________________________________________

And finally, line characteristics from the 3100's
______________________________________________________________________________
Line = SVA-0

Receive buffers          = 24
Controller               = normal
Protocol                 = Ethernet
Service timer            = 4000
Hardware address         = 08-00-2B-13-07-66
Device buffer size       = 1498
______________________________________________________________________________
2059.15Same problem here, too!ASD::LOWMember - American Autobahn SocietyWed Feb 07 1990 13:3814
    Are there any old MVII's on that net?  We're having the same problem
    here.  Plenty of CPU, relatively low Ethernet usage, neither the WS
    or the main CPUs paging excessively, the NETACP processes have only a 
    few hundred page faults, and yet we still get periodic "hangs" running
    remote applications. 
    
    I was guessing that the DEQNA on some of the MVII's we have was causing 
    a problem, but that may not be the case...  I too have no clue why this
    is happening...
    
    Any additional ideas would be appreciated!  (I'm in ZK)
    
    Dave
    
2059.16Could the DEQNA be the problem?AIRBAG::SWATKOElectrons are cheap. Trees are not.Thu Feb 08 1990 15:5810
>    I was guessing that the DEQNA on some of the MVII's we have was causing 
>    a problem, but that may not be the case...  I too have no clue why this
>    is happening...

If I remember correctly (and I may be way off base on this one), ther was
something about DEQNAs that caused problems.  A while back, we were all told
to upgrade our DEQNAs to DELQAs.  Maybe that has something to do with it.
Anyone know anything about this?

-Mike
2059.17DECWIN::JMSYNGEJames M Synge, VMS DevelopmentMon Feb 12 1990 13:317
    I would recommend looking at the activity on the client nodes. 
    Particularly at NETACP.  It maybe that some activity is occurring which
    is causing NETACP to fail to maintain the connections.
    
    Try a MONITOR PROC/TOPCPU or MONITOR DECNET.
    
    James
2059.18Hmmm...router problem.ASD::LOWMember - American Autobahn SocietyTue Feb 13 1990 17:2014
    Re: -.1
    
    I've done that - nothing seems out of the ordinary - 30-60% CPU
    utilization - with NETACP not even breaking into the double digits
    percentage wise...
    
    Looking at DECNET shows packet I/O rates of 20-40...well below the
    capacity of our Ethernet controller... 
    
    However, I have noticed that this phenomenon seems to occur only on
    our cluster router (a level IV router).  Might this be the cause?
    
    Dave
    
2059.19DECWIN::JMSYNGEJames M Synge, VMS DevelopmentWed Feb 14 1990 15:497
    Dave,
    	
    	I'd recommend running MONITOR in record mode on the cluster router. 
    	Then check the data it has recorded around the time of the next
    	connection abort event.  I suspect you'll see high activity levels.
    
    James
2059.20update on CREATE/TERM delays over DECnet?CSC32::FORSMANGinny Forsman 522-4731 CSC/CSFri Apr 27 1990 22:5327
    Has anyone come up resolution for this problem?
    
    My customer is seeing same behavior as .0.  VS3100, 16 mg, local
    page/swap disk, booting off a 6430, 96 mg, 5.3.  I have not been able
    to duplicate this here.
    
    From the client,
    SET DISPLAY/CREATE/NODE=vs3100
    CREATE/DET/TERM
    
    and in the decterm they run WPS, or ALlin1, or EDT.  Find a 10-20
    second delay every minute or so.  They can also duplicate this 
    running DECW$EXAMPLES:ICO.
    
    Their NETACP logicals are not defined.  Did a sho sys, and the NETACP
    process had only 246 pgflts.
    
    Again, all works fine using local transport.
    
    All help appreciated.
    
    Thanks,
    Ginny Forsman
    CSC/CS
    
    (not a decnet person, sorry...)
                                            
2059.21PSW::WINALSKICareful with that VAX, EugeneSat Apr 28 1990 21:029
RE: .20

The delays sound like a network timeout and then retransmission of a lost data
packet.  Go into NCP on both machines and do SHOW CIRCUIT COUNT for the circuit
being used, SHOW LINE COUNT for the line being used, and SHOW NODE COUNT for the
nodes involved.  Look for non-zero timeout, buffer unavailable, and other such
errors.  This definitely sounds like a network problem.

--PSW