[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference pamsrc::decmessageq

Title:NAS Message Queuing Bus
Notice:KITS/DOC, see 4.*; Entering QARs, see 9.1; Register in 10
Moderator:PAMSRC::MARCUSEN
Created:Wed Feb 27 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2898
Total number of notes:12363

2851.0. "SBS prob on EV5 and PCI" by SEFI04::SYSTEM () Wed Apr 16 1997 16:03

    Hi folks,
    	
    I have a strange behaviour on a customer site, with DMQ V32B(3252).
    They have 30 nodes (AS500 AS250) in cluster running OpenVMS V6.21H2. 
    The problem is that all EV5(AS500) systems, using a dual-rail
    optimized Ethernet on PCI (DE435), during a multicast (SBS) network 
    benchmark that enqueue/dequeue messages, is 10 times slower than all   
    *EV4* (AS250) systems, also using a dual-rail optimized Ethernet
    on PCI, so I'm just wondering if this is a known problem with EV5 
    processors and the dual-rail Ethernet on PCI.
    Thanks in advance 
    
    /Massimo B.
     
T.RTitleUserPersonal
Name
DateLines
2851.1PAMSIC::STEPHENSWed Apr 16 1997 17:2415
Hello,

More questions than answers, but even the older VAXes using SBS 
could drive the ethernet to saturation.   I'm curious what your
benchmark shows, what it does, and how you are measuring the 
enqueue/dequeue rate.   Could you tell a bit more, do you have 
one or more senders?   Could you give a description of the stress
test, how it works.

Some of the older AXP systems used turbochannel ethernet cards which
took a performance hit, but you indicated all these boxes use a pair
of PCI DE435 to form the dual rail?

Thank you,
Bruce
2851.2more infos...SEFI04::SYSTEMFri Apr 18 1997 09:5224
    Hi Bruce,
    
    	Thanks for the prompt answer to .0 and sorry for the delay.
    	The network bench is composed by a transmission program and a 
    	receive program, both located into differents machines.
    	The transmission prog. sends periodically 1,368 byte packet, and
    	waits 1sec to get the acknowledge from the receiver.
    	Now if the receiver doesn't get any errors, the transmission time
    	from one packet to another, tends to reduce, otherwise it grows.
    	The minimum transmission time is set to 5millisec while the maximum
    	is set to 400millisec.
    	The transmission is handled by the broadcast service (SBS), while
    	errors are handled by the communication service (COM).
    	This bench performed in a EV4 architecture machine, like AS250
    	4/266, shows an average delay of 5millisec, while on a EV5 (AS500), the
    	average delay tends to be greater than 50millisec.
    	Into DMQ V3.2A release notes, it is mentioned that the problem when
    	using the direct Ethernet broadcast feature of the SBS Server with
    	messages sizes over 1,300 bytes, has been corrected and is supposed 
    	to work in V3.B.
    
    	Let me know.
    
    	/M.Balladelli
2851.3PAMSIC::STEPHENSFri Apr 18 1997 13:0147
Hi.  

I'm trying to go over what you wrote about the test and the numbers
you indicated.   I believe, from your description, you burst x number
of packets, then wait to hear from the receiver if they made it OK.
If OK, then you burst x+ some additional number, and if bad then you
burst x- some number.   

From your results, 5msec is 200 packets of ~1500 bytes is about 
300k/bytes down the ethernet.   Depending on what else is going on 
the wire, you should be able to easily do this, in fact I would 
expect that you should be able to achieve a throughput in the
700-800k/byte range on a dedicated wire with your fast hardware.

The critical to obtaining a high throughput is two key things:

First and foremost, the receiver must be configured to put the data 
 somewhere, otherwise the bits go off into the ether.   We do have 
 a knob to turn for SBS which is in the DMQ$SET_SERVER_LOGICALS.COM
 in DMQ$USER, under the SBS section, there is a logical name for
 the number of receive buffers the Ethernet driver will allocate 
 for each channel that the SBS Server uses.  The logical name:
 DMQ$SBS_ETH_DRVR_BUFFERS is defaulted to 16.   Increasing this
 number will effect BUFLM for the SBS server, so you must go to the
 DMQ$SET_SERVER_QUOTAS.COM file in the same directory and increase
 BUFLM for SBS as you raise DMQ$SBS_ETH_DRVR_BUFFERS.   Normally 
 you should be able to take this value up into the 50-70 range 
 without sysgen parameter reconfiguration (NPAGEDYN, etc).

Second, your transmitter cannot burst the data, it must pace the
 sending down the wire.   From the description in .2, you seem to be
 doing this in your test, but just to make it clear, what you want to
 do is spread the writes out over the window of measurement.  For 
 example, if you want to send 200 packets in one second, your test
 should:   write, wait .005sec, write, wait .005sec,  write, etc...
 You should NOT do this: calculate I need to send 200 packets per 
 second, then do a  write,write,write,etc...    What will happen,
 especially on fast hardware, the data will go out the wire in the
 first part of the second, which tends to overrun the receiver, and 
 drop the data.
                                                

As to why the hardware/software configuration is different between
boxes, I don't know, but please let us know what you learn.

Hope this helps, 
Bruce
2851.4still doesn't work...SEFI04::SYSTEMFri Apr 18 1997 13:5112
    Bruce,
    	I've already modified DMQ$SBS_ETH_DRVR_BUFFERS up to 128, but the
    	problem remain, while on the AS250 without modifying this value it
    	works fine. So what is the difference between these 2 machines?
    	Also I've the same behaviour on the AS8400(EV5) using dual-rail 
    	ethernet PCI while using 2 AS8400 with XMI ethernet (DEMNA) the 
    	bench works fine. So it seems that the problem is located into EV5
    	machines with ethernet on PCI. 
    	Reducing the packet size from 1,368 to 1,300 bytes the bench works
    	fine on all platforms (ev5,ev4,PCI,XMI).
    Let me know.
    M.Balladelli
2851.5PAMSIC::STEPHENSFri Apr 18 1997 14:5331
Hi,

I think you question is going out of my area of knowledge, but 
there are a couple of suggestions:

1) try using DTSEND.  In order to run DTSEND, you either need a 
 valid DTR$SERVER user account on the DTR object, or a nonpriv 
 DECNET default username and password.

 $ mc dtsend
 _Test: connect/node=yomama
 %NET-S-NORMAL, normal successful completion

 _Test: data/size=512/seconds=30/type=sink/speed=10000000
 _Test: data/size=512/seconds=30/type=echo/speed=10000000

 _Test: data/size=1498/seconds=30/type=sink/speed=10000000
 _Test: data/size=1498/seconds=30/type=echo/speed=10000000

    
2) If the test above shows the rate difference you are seeing between
the EV4/5 systems, then DmQ/SBS is off the hook...   Drop a note in 
the HUMANE::ETHERNET notes conference with your findings and a network 
topology map.   Otherwise, if you see this behavior only with SBS, then
I'll need to dig further and see what/where our software is having
problems.


Hope this helps,
Bruce

2851.6fixedSEFI04::SYSTEMTue May 06 1997 09:504
    Hi,
    	the problem has been fixed with ALPSYS06_070 patch kit.
    
    /Massimo Balladelli