[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference bgsdev::open3d

Title:open3d
Notice:Kits on notes 3 and 4; Documents note 223
Moderator:WRKSYS::COULTER
Created:Wed Dec 09 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1306
Total number of notes:5260

1285.0. "GLUT test10 perf issues!" by RHETT::HALETKY () Fri May 02 1997 13:38

    Hello,
    
    Why is it that when a cusotmer runs test10 from the GLUT code on an SGI
    displaying to an Alpha that the performance is nearly ten times faster
    than when the same test is run and displayed onto an Alpha?
    
    We can not replicate here but the customer is using SGI code and not
    our examples code. Any attempts to compile our examples on an SGI to
    test against fail.
    
    
    Any answers, reasons, suggestions?
    
    Best regards,
    Ed Haletky
    Digital CSC
T.RTitleUserPersonal
Name
DateLines
1285.1what graphics card?WRKSYS::RICHARDSONFri May 02 1997 14:253
    What graphics card is in the Alpha?
    
    /Charlotte
1285.2glut test 10 is glutBitmapCharacterVESPER::VESPEROpenGL Alpha GeekFri May 02 1997 14:5041
>    Why is it that when a cusotmer runs test10 from the GLUT code on an SGI
>    displaying to an Alpha that the performance is nearly ten times faster
>    than when the same test is run and displayed onto an Alpha?

Let me restate this to be sure I understand.

Configuration 1:
	client runs on SGI box
	server runs on ALPHA box
	transport is TCP/IP
	performance is x frames per second

Configuration 2:
	client runs on ALPHA box
	server runs on the same ALPHA box
	transport is unix:0 or :0 or local:0
	performance is x/10 frames per second

How is the performance being measured?

test 10 does a lot of calls to glutBitmapCharacter. That routine
does a lot of inquires and state settings and one glBitmap call.
The inquires and state settings are all client-side state.
This does not make much sense to me.

However, you also say:

>    We can not replicate here but the customer is using SGI code and not
>    our examples code. Any attempts to compile our examples on an SGI to
>    test against fail.

If you can't get the SGI code that the customer is using, perhaps using
a common source pool will be sufficient for your testing. GLUT is freely
available in source form

 http://reality.sgi.com/mjk_asd/glut3/glut3.html

and you should have no problems getting it to compile on both SGI and
ALPHA boxes.

Andy V
1285.3glutbtmap that much slower?RHETT::HALETKYTue May 06 1997 18:1820
    Hello Andy,
    
    Your surmise is correct.
    
    It is 10x slower running locally than via TCP/IP.
    
    
    Also, the /usr/examples/GL directory does not complile on Alpha, how do
    you expect it to compile on an SGI? Try it some day. I had to
    fiddle faddle with makefiles and code just to get it to compile on the
    Alpha.
    
    
    
    
    Any answers to the questions? Is the GlutBitmap that much faster on the
    client side (SGI) than on an Alpha?
    
    -ed haletky
    
1285.4more information is neededVESPER::VESPEROpenGL Alpha GeekTue May 06 1997 20:2311
>    Any answers to the questions? Is the GlutBitmap that much faster on the
>    client side (SGI) than on an Alpha?

Either that, or the client side program is not the same. That's
why I suggested getting the latest GLUT source bits and compiling them
on both Alpha and SGI and doing that comparison.

What graphics board are you using, anyway? What version of Open3D?
What system box? What operating system?

Andy V
1285.5Some config info...NNTPD::"rolandr@mail.dec.com"Rick RolandFri May 09 1997 12:4416
  I am working with the Account Rep for the account, who asked me to call 
the customer...  I have gathered the following information:
 -- AlphaStation 500/400, 128 MB Memory, 2ea- 4.3 GB Disks
 -- Digital UNIX V4.0B
 -- OpenGL V4.3
 -- 4D20 Graphics Adapter (SGI has 4D85DT graphics board ??)

  Customer has said that he is perceiving the same performance
problem after running the example code (test10) as well!?

Any ideas or comments?

Thanks, Rick Roland

(716) 223-4354   rolandr@mail.dec.com
[Posted by WWW Notes gateway]
1285.6Any options?NNTPD::"rolandr@mail.dec.com"Rick RolandFri May 09 1997 18:1939
  Question: Do we have any options for the issue detailed below??

  The response was from CSC (Ed Haletky), and based on engineering input...
It would appear that this is true "regardless" of graphics adapter used (4D20
in 
this case...or even 4D40/50/60) if our implementation of GLX is as stated...
Could someone from Engineering verify that there are no workarounds (different
graphics adapter, re-configuration options, code patches, 3rd party X-Servers)
or future plans for support of this feature??

  This customer recently purchased a couple of 8400's and a dozen workstations
and has spent several million with Digital over the last 6 months alone!  They

are a very good customer and have been migrating "off" SGI...  I'd just like
to
ensure that we have pursued all options...

Thanks, Rick.



========================================================================
OpenGL on X windows has two pieces. The GLX extension and the code to
contact the graphics adapter. On a Digital machine all that code to 
contact the graphics adapter is in the X Server. So all contact to it is
via the Graphics Adapter. There is no 'direct' path to the hardware that
ignores the server.

On an SGI box this direct path exists. It bypasses the server completely.
On an AIX box this direct path exists. It bypasses the server completely.
On a Solaris box this direct path exists but through Xgl usage.
On an HP box this direct path exists but via starbase.

So DEC goes throught the X server counting on the performance of the CPU
to do the work.


[Posted by WWW Notes gateway]
1285.7VESPER::VESPEROpenGL Alpha GeekMon May 12 1997 13:3822
>OpenGL on X windows has two pieces. The GLX extension and the code to
>contact the graphics adapter. On a Digital machine all that code to 
>contact the graphics adapter is in the X Server. So all contact to it is
>via the Graphics Adapter. There is no 'direct' path to the hardware that
>ignores the server.


1. If the implementation is fast enough, why does it matter? (A partial
answer - interactivity.)

2. Starting with the Cateyes boards, we are doing a lot more work
in the client-side library. The server is only involved in managing the
hardware, not in the execution of the OpenGL pipeline.

>So DEC goes throught the X server counting on the performance of the CPU
>to do the work.

Even with direct access to the hardware we are counting on the performance
of the CPU to do a lot of the work. We have no hardware that does
geometry acceleration.

Andy V
1285.8Not Exactly TrueNNTPD::"mir@.eng.pko.dec.com"Michael RosenblumMon May 12 1997 13:4422
The description at the end of .6 is not exactly correct for the PowerStorm 
4D40T, 4D50T, and 4D60T.  These are all DMA based devices, and the gl library
calls on these devices actually generates the hardware specific commands
required to do 3D graphics.  The X Server is used only for synchronization and
management of critical resources like texture memory.  In the normal case, the
server does not touch any of the data that goes to the hardware.  As far as I
am concerned this is a direct path.

You can prove this to yourself by watching the performance monitor on a
graphics intensive benchmark.  

You may also be interested in knowing that 
we used the same direct path method to increase the performance of 2D graphics
this is something that none of our competitors have done.  It gave us a 2 to
4x
speedup when drawing short strings of text, short lists of 2D vectors, X
polygons
and Standard non-shared memory PutImages to mention a few.

Michael Rosenblum
Project Leader for the PowerStorm 4D40T, 4D50T, and 4D60T
[Posted by WWW Notes gateway]
1285.9First things first ...WRKSYS::COULTERIf this typewriter can't do it, ...Mon May 12 1997 13:5914
    Before we work too hard on the "wrong problem", could we back
    up here and do what was suggested days ago:
    
    	-- get the same code running on both machines
    
    Either get the SGI source and run it on the Alpha, or get
    the Alpha source and run it on the SGI.  Get the client to
    be the same, and then let's compare results.
    
    (By the way, what happens when you have an Alpha client and
     and SGI server?)
    
    			dick
    
1285.10More Info & Questions...NNTPD::"rolandr@mail.dec.com"Rick RolandMon May 12 1997 15:4638
  I just spoke with the customer and here is the "latest":

o The performance "anomaly" can be replicated, using the OpenGL supplied
  example program, referred to as "test10"... this exibits the same
performance
  differential between the SGI and Alpha.

o Performance numbers (UNIX time function) for running this test are as
follows:
  --> Alpha  - 9.2 realtime,  0.7 user,  1.8 system
  -->  SGI   - 3.2 realtime,  0.34 user, 0.37 system  
  Alpha is AlphaStation 500/400 w/4D20, UNIX 4.0B, OpenGL 4.3, 128 MB memory
  SGI is Indigo 2(200Mhz)w/ XZ Graphics, IRIX 5.3, 64MB memory
    (according to customer XZ graphics has 2 geometry engines, 1 raster
engine,
     24 bit double buffer, Z-buffer, 4 aux. planes, 4 CID (clipping ID)planes)
  ** Can we replicate this environment, running "test10" example programs?
  ** Customer environment is "dark" - government confidential...

o Anomaly is that running the application "local" to Alpha exhibits poor
  performance... however, running the application on SGI workstation and
  "displaying" to Alpha runs as fast as on the SGI local??!

  I was beginning to think that the problem was with the 4D20, and that
we needed to have them upgrade to 4D40/50/60 for direct hardware GL
support... but the last comment above is perplexing?!  Unless the SGI
is doing something in processing the GL code before shipping it to the 
Alpha?  

  OR could this be a configuration issue?  Customer claims that he
can set "direct" or "indirect" rendering on SGI without any significant
difference in performance...

  Any recommendations?

Thanks, Rick.
[Posted by WWW Notes gateway]
1285.11We are getting a little closer...VESPER::VESPEROpenGL Alpha GeekMon May 12 1997 18:4542
>o The performance "anomaly" can be replicated, using the OpenGL supplied
>  example program, referred to as "test10"... this exibits the same
>performance
>  differential between the SGI and Alpha.
>  ** Can we replicate this environment, running "test10" example programs?
>  ** Customer environment is "dark" - government confidential...

OK, so we don't know what the real program looks like, just that somebody
thinks that 'test10' is somehow representative of the real program.

I've looked at 'test10' a little closer, and find that the glutBitmapCharacter
calls are likely to be swamped by the time to clear and swap the buffers.
This would be the same whether driven locally or remotely.

>  OR could this be a configuration issue?  Customer claims that he
>can set "direct" or "indirect" rendering on SGI without any significant
>difference in performance...

This is an easy one. Unless you set an environment variable, SGI will
happily use a direct context even if you ask for indirect. Unfortunately
I don't know the name of the environment variable, but I'm sure it is
in SGI documentation somewhere.

>o Anomaly is that running the application "local" to Alpha exhibits poor
>  performance... however, running the application on SGI workstation and
>  "displaying" to Alpha runs as fast as on the SGI local??!

This is still the curious factor, and the first thing I'd do is to
be 100% certain that the same program is being run. I would copy the latest
GLUT sources to each machine and compile them to be sure of that.

Also, using the 'time' command may not be measuring the 'right' thing --
it includes starting up the program and connecting to the server and a
bunch of different things. A more careful timing regime involves such
things as 'priming the pump', calling glFinish() and making sure you run long
enough to avoid timer issues.

Finally, to see if there is a transport problem of some kind, please run
'test10' on the UNIX box with DISPLAY set to :0 and hostname:0 and see if
there is a difference.

Andy V
1285.12followup -- indirect contexts for SGIVESPER::VESPEROpenGL Alpha GeekTue May 13 1997 12:4313
>  OR could this be a configuration issue?  Customer claims that he
>can set "direct" or "indirect" rendering on SGI without any significant
>difference in performance...

:This is an easy one. Unless you set an environment variable, SGI will
:happily use a direct context even if you ask for indirect. Unfortunately
:I don't know the name of the environment variable, but I'm sure it is
:in SGI documentation somewhere.

Set GLFORCEDIRECT to "no" is the answer I saw in the comp.graphics.api.opengl
news group today.

Andy V
1285.13Answers to .11NNTPD::"rolandr@mail.dec.com"Rick RolandTue May 13 1997 12:5550
Andy...  responses to your recommendations:


>Unless you set an environment variable, SGI will happily use a direct context

>even if you ask for indirect. Unfortunately I don't know the name of the 
>environment variable, but I'm sure it isin SGI documentation somewhere.

  The customer ensures me that they have tested this using the command line
flag "-indirect" and "-direct", which will request the appropriate mode...

>This is still the curious factor, and the first thing I'd do is to
>be 100% certain that the same program is being run. I would copy the latest
>GLUT sources to each machine and compile them to be sure of that.

  The customer did, in fact, copy the same sources to both machines,
compile and run the code... The only other configuration question I asked
was the "specific" version of OpenGL on the SGI... the system doesn't seem
to list the OpenGL version number... it is supposedly "whatever version" 
is shipped with SGI IRIX V5.3 O.S...  so the sources "could" be linking
against
libraries from different versions...

>Also, using the 'time' command may not be measuring the 'right' thing --
>it includes starting up the program and connecting to the server and a
>bunch of different things. A more careful timing regime involves such
>things as 'priming the pump', calling glFinish() and making sure you run long
>enough to avoid timer issues.
 True, but they find it hard to believe that it would (consistently) make this
 significant a difference...  (9.2 sec on Alpha... 3.2 on SGI)

>Finally, to see if there is a transport problem of some kind, please run
>'test10' on the UNIX box with DISPLAY set to :0 and hostname:0 and see if
>there is a difference.
  Customer ran both ways...  :0 came up at 9.2 secs (as before)...
                      display:0 came up at 12-13 secs (few seconds slower)

 Question:  I "still" feel that .8 may have had a point, which could explain
  this significant difference... and that is that the 4D40/50/60 series
  graphics adapters pforvide direct GL routine support in hardware... does
  this make the most sense??  (that the 4D20 adapter has much more overhead)

  Thanks, Rick.

Andy V



                    
[Posted by WWW Notes gateway]
1285.14.13 answers to .12 (sorry)NNTPD::"rolandr@mail.dec.com"Rick RolandTue May 13 1997 12:575
Answers in prior reply were to .12 (not .11 as noted)

Sorry, Rick.
[Posted by WWW Notes gateway]