[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::nt-developers

Title:MS Windows NT Developers
Notice:See note 1222 for MS bug reporting info
Moderator:TARKIN::LINEIBER
Created:Mon Nov 11 1991
Last Modified:Tue Jun 03 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:3247
Total number of notes:15633

3238.0. "Named Pipes and SMP" by NNTPD::"rogerg@mail.dec.com" (Genne Roger) Thu May 08 1997 17:37

Have a Internet content provider with C++ app written for NT4.0 named pipes. 
The application running on Alpha 4100, runs 17% slower on 2 cpus than on a
single cpu.  The ASAP group confirmed this and also showed that as we add
CPUs, our throughput measured in resolutions/sec is further degraded.  The
same thing happens on an Intel machine.  Our field engineers have confirmed 
we are on the latest firmware releases and HAL so we are down to determining
if
the application is written wrong for SMP or there is a bug in NT (hmm?)
Anyone willing to take a shot at this?  We need someone who knows how to write
C++ for SMP to help or tell us where we are wrong.  Ten 4100s are riding on
this versus our friends at Sun.  
Cross-posted in Windows-NT and C++.
Thanks for any help/suggestions,
Genne Roger
510-233-3386
[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
3238.1HYDRA::CHINFri May 09 1997 18:36172

  RE:  -1 

  Also cross-posted at C_PLUS_PLUS #3564, NT-DEVELOPER #3238


  Brief description of the test, RAMDBASER.exe and RAMDBABU.exe:

  The work of the ramdbserver is to do looking up entries in a 
  giant hash table, supposed to run very fast.  It is a mutithreaded
  application written in VC++ 4.0.  There are three threads spawned
  by ramdbser.exe: one of the idle threads is a UI thread, which 
  gets very little time during the course of execution.  The other
  is the DB sweep thread which is designed to be very lightweight.
  Basically it just checks to see if a record has aged out of the
  database and marks it dead.

  The customer, a SUN shop, is trying to understand why his application is 
  not scalable on WNT SMP system (intel and alpha).  The customer 
  strongly suspect a named pipe problem in in NT. This application is
  designed to test the serial resolution efficency of the named pipe
  mechanism.

  For those who would like to take a look, the programs can be ftp at 
  fluid:/pub/RAMDB/Alpha/ramdb1.zip, and ../Intel/ramdb2.zip.


  See enclosed two of my reports - names removed to 'protect' the
  innocent.


  Again, there is a sales pending  due to the SMP server performance.
  Any help would be greatly appreciated. 

  Thanks, 
  Miller  

-------------------------------------------------------------------
Report#2  5/8/97 to the customer

I have completed the Pentium Pro 200 (2cpu) test and the
result is 1500res/sec which is slower than 1700 res/sec for
single CPU pentium pro 200.  So the application/program
RAMDBABU.exe and RAMDBser.exe did not scale well (actually
took some performance loss) on both the Intel/NT and ALpha/NT
platforms.   We have to look at the following closely:



The Digital Personal Workstation 200i2
    (dual cpu pentium pro 200) result:
    ---------------------------------

Sys configuration: 128MB memory, 512KB cache, NT server 4.0 SP2, 
    pagefile size 128MB,
    2CPU active, Display: Matrox 4MB 1024x768x256
    Disk: 2GB

    DPW 200i2   2CPU  -- 6670ms, 1500 res/sec     (tested in MRO lab)
    
    Comparisons with other systems:     
                 4100 5/300  (hal.dll 99,520KB, 12/14/97 multiprocessor)
                 1CPU      5550ms, 1801 res/sec
                 2CPU      6352ms, 1574 res/sec
                 3CPU      6195ms, 1614 res/sec
                 4CPU      6788ms, 1473 res/sec

      The data ONLIFE reported are:
      ------------------------- 
          1700 res/sec   for Pentinum Pro 200, single cpu
          2300 res/sec   for AlphaServer 4100, 1CPU, 466MHz 2GB RAM
          1900 res/sec   for AlphaServer 4100, 2CPU, 466MHz, 4GB RAM

The rest of statistics data based on PERFMON, PVIEW, and NT task 
manager on Pentium Pro look very similar to the ALphaserver (ie.
high syscall/sec, context switches/sec, %processor time, threads
context switches,etc). The working set on Pentium is smaller (2120KB 
for Ramserver on Intel vs. 2808KB on Alpha) though.



----------------------------------------------------------------------
Report #1  5/7/97  to the customer.


I completed the first round of test and you can find attached
the test and analysis.  Also, I need a few things from
you to proceed further:

  1. I tried to rebuild ramdbabu.exe with VC50, but ramdbabuse.h
     was missing.  Can you forward me a copy?
     I think the ramdbabu.h is for ramdbabuseDlg.h.

  2. I have a Pentinum 200i dual system here.  Genne told me
     that the result you got on a single Pentium was 1700 res/sec.
     Can you send me the Intel version of ramdb so I can see
     how Intel SMP server behaves?

  3. What exactly is the ramdb workload trying to do?  


The Alphaserver 4100 test results
- - - ---------------------------------
System configuration:
      AlphaServer 4100 (EV5 - 21164 chip) 5/300   300MHz
         Note: This system is not exactly the same as yours.  I believe
         your AlphaServer 4100 466MHz is a EV56 - 21264 system.
         Also, your 466Mhz result (2300 res/sec 1 CPU result is higher
         than mine possibly due to the clock speed).

      Memory: 524MB, Cache 2MB, Disk: 4GB NTFS.
      NT: V4.0 SP1

      4100 5/300 1CPU      5550ms, 1801 res/sec
                 2CPU      6352ms, 1574 res/sec
                 3CPU      6195ms, 1614 res/sec
                 4CPU      6788ms, 1473 res/sec

      The data you reported are:
      ------------------------- 
          1700 res/sec   for Pentinum Pro 200, single cpu
          2300 res/sec   for AlphaServer 4100, 1CPU, 466MHz 2GB RAM
          1900 res/sec   for AlphaServer 4100, 2CPU, 466MHz, 4GB RAM


Analysis
- - - --------
- - - - No Alignment Fixups          
- - - - No Floating Point Emulations
  Note: High number of fixups or emulations will cause big performance
       degradation on Alpha.

  
- - - - Sys call/sec is 16,094 max
- - - - 100% CPU usage 
- - - - Working set is 2808KB
- - - - Multithreading
      RAMABU has 1 thread (81% privilege mode, 19% user mode,
                           context switches/sec 466201)
      RAMSERVER has 3 threads 
		   0 - 96% priv mode, 4% user, cont sw: 1908
                       Dynamic Priority: 14
                   1 - 89%            11%      cont sw: 239571 for 2CPUs
                                                        429937 for 3CPUs
                       Dynamic Priority: 15
                   2 - 0%             0%
                       Dynamic Priority: 1

  Observation:  
  1.  By looking at the result, there seems to be a scalability issues on 
      AlphaServer.  However, we have to conduct the same tests on Intel Server 
      on NT 4.0.  The issues could be related the NT, the OS.
      
  2. There is a large number of sys call/secs, and context switches,
     especially for the server thread #1.  Context switches also increases 
     after adding extra CPUs.

     What are server threads #0, and #1 doing?
     Why the server thread #2 is doing nothing?  

     I am interested to see how Intel SMP server handles the test.


Next steps:
I have dissemabled the ramdbabu.exe and ramdbser.exe to look at the
machine code.   The code was compiled with no debug and no
/Zh switches that could impact performance otherwise. 


---------------------------------------------------------------------
  
3238.2HYDRA::CHINFri May 09 1997 18:41116

  Here are a portion of the dump files (also at the fluid ftp site).  -Miller


Dump of file ramdbabu.exe

File Type: EXECUTABLE IMAGE

  00402000: 23DEFFF0 lda           sp,0xFFF0(sp)
  00402004: A2010010 ldl           a0,0x10(t0)
  00402008: B75E0000 stq           ra,0(sp)
  0040200C: D34003EC bsr           ra,00402FC0
  00402010: A75E0000 ldq           ra,0(sp)
  00402014: 23DE0010 lda           sp,0x10(sp)
  00402018: 6BFA8001 ret
  0040201C: 47FF041F nop
  00402020: 23DEFFE0 lda           sp,0xFFE0(sp)
  00402024: 47EC9411 mov           0x64,a1
  00402028: B75E0000 stq           ra,0(sp)
  0040202C: 47FF0412 clr           a2
  00402030: B7FE0008 stq           zero,8(sp)
  00402034: 63FF0000 trapb
  00402038: B21E0010 stl           a0,0x10(sp)
  0040203C: 43F00010 sextl         a0,a0
  00402040: B3DE000C stl           sp,0xC(sp)
  00402044: D34003E2 bsr           ra,00402FD0
  00402048: A01E0010 ldl           v0,0x10(sp)
  0040204C: 247F0040 ldah          t2,0x40
  00402050: 63FF0000 trapb
  00402054: 47E03401 mov           1,t0
  00402058: 206341A0 lda           t2,0x41A0(t2)
  0040205C: B03E0008 stl           t0,8(sp)
  00402060: B0600000 stl           t2,0(v0)
  00402064: 20DFFFFF mov           0xFFFF,t5
  00402068: A75E0000 ldq           ra,0(sp)
  0040206C: A01E0010 ldl           v0,0x10(sp)
  00402070: B0DE0008 stl           t5,8(sp)
  00402074: 63FF0000 trapb
  00402078: 23DE0020 lda           sp,0x20(sp)
  0040207C: 6BFA8001 ret
  00402080: 6BFA8001 ret
  00402084: 00000000 call_pal      halt
  00402088: 00000000 call_pal      halt
  0040208C: 00000000 call_pal      halt
  00402090: 243F0041 ldah          t0,0x41
  00402094: A001A2C0 ldl           v0,0xA2C0(t0)
  00402098: 6BFA8001 ret
  0040209C: 00000000 call_pal      halt
  004020A0: 241F0040 ldah          v0,0x40
  004020A4: 20004000 lda           v0,0x4000(v0)
  004020A8: 6BFA8001 ret
  004020AC: 00000000 call_pal      halt
  004020B0: A0210018 ldl           t0,0x18(t0)
  004020B4: 23DEFFF0 lda           sp,0xFFF0(sp)
  004020B8: B75E0000 stq           ra,0(sp)
  004020BC: 22010080 lda           a0,0x80(t0)

   .
   . 
   .


Dump of file ramdbser.exe

File Type: EXECUTABLE IMAGE

  00402000: 243F0041 ldah          t0,0x41
  00402004: A001C48C ldl           v0,0xC48C(t0)
  00402008: 6BFA8001 ret
  0040200C: 00000000 call_pal      halt
  00402010: 241F0040 ldah          v0,0x40
  00402014: 20006000 lda           v0,0x6000(v0)
  00402018: 6BFA8001 ret
  0040201C: 00000000 call_pal      halt
  00402020: A2010010 ldl           a0,0x10(t0)
  00402024: 23DEFFF0 lda           sp,0xFFF0(sp)
  00402028: B75E0000 stq           ra,0(sp)
  0040202C: D3400774 bsr           ra,00403E00
  00402030: A75E0000 ldq           ra,0(sp)
  00402034: 23DE0010 lda           sp,0x10(sp)
  00402038: 6BFA8001 ret
  0040203C: 47FF041F nop
  00402040: 23DEFFE0 lda           sp,0xFFE0(sp)
  00402044: 47FF0411 clr           a1
  00402048: B75E0000 stq           ra,0(sp)
  0040204C: B7FE0008 stq           zero,8(sp)
  00402050: 63FF0000 trapb
  00402054: B21E0010 stl           a0,0x10(sp)
  00402058: 43F00010 sextl         a0,a0
  0040205C: B3DE000C stl           sp,0xC(sp)
  00402060: D340076B bsr           ra,00403E10
  00402064: A01E0010 ldl           v0,0x10(sp)
  00402068: 63FF0000 trapb
  0040206C: 247F0040 ldah          t2,0x40
  00402070: 47E03401 mov           1,t0
  00402074: 20636088 lda           t2,0x6088(t2)
  00402078: B03E0008 stl           t0,8(sp)
  0040207C: B0600000 stl           t2,0(v0)
  00402080: 20DFFFFF mov           0xFFFF,t5
  00402084: A75E0000 ldq           ra,0(sp)
  00402088: A01E0010 ldl           v0,0x10(sp)
  0040208C: B0DE0008 stl           t5,8(sp)
  00402090: 63FF0000 trapb
  00402094: 23DE0020 lda           sp,0x20(sp)
  00402098: 6BFA8001 ret
  0040209C: 00000000 call_pal      halt
  004020A0: 23DEFFF0 lda           sp,0xFFF0(sp)
  004020A4: 22010018 lda           a0,0x18(t0)
  004020A8: B75E0000 stq           ra,0(sp)
  004020AC: D3400038 bsr           ra,00402190
  004020B0: A75E0000 ldq           ra,0(sp)
  004020B4: 23DE0010 lda           sp,0x10(sp)
  004020B8: 6BFA8001 ret
  004020BC: 47FF041F nop
  004020C0: 23DEFE70 lda           sp,0xFE70(sp)