[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::dbintegrator_public_public

Title:DB Integrator Public Conference
Notice:Database Integration - today! Kit/Doc info see note 36
Moderator:BROKE::ABUGOV
Created:Mon Sep 21 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1171
Total number of notes:5187

1045.0. "Performance tuning check list" by ORAREP::STKHLM::KNORN ("Stefan Knorn SI Stockholm") Tue Apr 16 1996 15:06

    
    	Soon a customer will start to perform load tests on their database
    	which in fact is a DBI-database.
    	The physical databases (all RDB btw) are located on the same node
    	as the DBI-database.
    	Inititally a lot of data will be inserted in the physical databases
    	accessed via DBI. 
    	I'd appreciate any performance tuning hints (pointers in
    	documentation etc) on what to check prior to starting the load
    	of the data.
    	How much overhead should be calculated when using DBI to perform
    	SQL INSERTS compared to doing the same directly on the physical
    	database ?
    
    	Stefan Knorn
    	SI Stockholm
T.RTitleUserPersonal
Name
DateLines
1045.1BROKE::HANCKELThu Apr 25 1996 15:1729
    
    
    Loading costs can vary widely depending on what mechanisms and
    gateways you are using, as well as what is serving as source
    and destination.
    
    For example if you are loading an imported RDB table directly from
    an application that is providing the data, you should expect to see 
    very good performance relative to loading the RDB table directly.
    The pathlengths and overhead for DBI should be very low, since
    DBI is acting like a dispatcher and there is little DBI processing
    involved. If you are loading an updatable horizontal partition view from
    an application there will be slightly more overhead because
    DBI needs to look at the partitioning column and pick the right
    place to do the insert.  Again this overhead is not very big.
    
    Inserts from a SELECT clause could be slower depending on the
    nature of the SELECT clause and the size of the aggregate data
    selected.  Data may need to aggregate in DBI's workspace before
    the insertion part of the work starts to proceed.  The next release
    of DBI has a much better dataflow model (via improved fast-first features)
    which means less aggregation.  INSERT statements using select
    clauses with ORDER BYs or implicit sorting (e.g. GROUP BY) will
    always force DBI or any engine to have to do the SELECT in its
    entirety (and store the interim results in virtual memory or
    an on disk workspace) before the insert work can proceed.       
    
    
    
1045.2Some info...BROKE::ABUGOVFri Apr 26 1996 21:0297
    
    Hi Stefan,
    
    Here is some raw data gathered by Frank Lee.  The bottom line though
    is that we don't believe you will see a large deviation from directly
    inserting into an Rdb database.  Note that these tests were run on our
    cluster (which is active) although Frank did multiple runs of each to
    try to get accurate data.
    

	Tables added to the catalog for this analysis:

lineitem0: in TEST$DATABASE:R51_UDB_1.RDB (row size: 201, populated w/ 64k rows;
lineitemr: in TEST$DATABASE:R51_UDB_1.RDB (same table definition as lineitem0)
lineitemra: in TEST$DATABASE:R51_UDB_2.RDB (same table definition as lineitem0)

Test results are as follows.

I. lineitem0 to lineitemr using Rdb only:

SQL> attach 'f TEST$DATABASE:R51_UDB_1.RDB';

--- Test case A:
SQL> insert into lineitemr (select * from lineitem0 where l_orderkey < 10000);
9965 rows inserted
....
--- Test case B:
SQL> insert into lineitemr (select * from lineitem0 where l_orderkey<20000);
20058 rows inserted
....

 Test      Average	   Average	    Average          Average 
 Case     elapsed time     CPU time        Page Faults        DIO
  A         52 sec.         10.5 sec.          57             11908 	    
  B         74              24.7              116             14272

A memory leak of ~100 bytes per record was observed. This is an RDB problem
and a QAR was filed against RDB V7.0.


II. lineitem0 to lineitemr using DBI (DBI_OPTIMIZATION_LEVEL is TOTAL TIME):

$DEF SQL$DATABASE /TYPE=DBI/DBNAME=DOC21:[F_LEE.LOCAL_TEST.CATALOG]CAT_4_2_515

$dbisql
SQL> attach 'f sql$database';

--- Test case A:
SQL> insert into lineitemra (select * from lineitem0 where l_orderkey < 10000);
9965 rows inserted
....
--- Test case B:
SQL> insert into lineitemra (select * from lineitem0 where l_orderkey<20000);
20058 rows inserted
....

 Test      Average         Average          Average          Average   
 Case     elapsed time     CPU time        Page Faults        DIO     
  A          52             15.6               57             11885     
  B          76             25.0              116             14278  
 
	In this mode of operation, the DBI materialized the data collection
for the lineitem0 before inserting into the target table. Each tuple used
about 500 bytes. For case A, the additional virtual pages increased was
8924 pages; for case B, 19920 pages. The memory leak per record is ~100 bytes
(the same as test case I)
    
III. lineitem0 to lineitemra using DBI (DBI_OPTIMIZATION_LEVEL is FAST FIRST):

Similar setup as in II.


 Test      Average         Average          Average          Average
 Case     elapsed time     CPU time        Page Faults        DIO
  A         53              15.7              57              11886
  B         73              24.3             116              14278      

	In this mode of operation, the DBI fetched a tuple from the 
from lineitem0 and inserted the tuple into lineitemra repeatedly, without
buffering the source tuples. No additional vitual memory was used for
the buffering. However, the memory leak per record is about the same (~100
bytes/record).
    
IV. Summary:


For the insert statements tested with TOTAL TIME and FAST FIRST optimization,
   the results in II and III do no show significant difference in turns of 
   performance. With two data points, it is hard to save which is better.
   However, even that the amount of IOs are the same, to buffer large number of
   source tuples will definitely affect the performance for the TOTAL TIME
   optimization. 

   Until we have more performance data, we recommend that users 
   set the DBI optimization level to FAST FIRST for this type of 
   SQL operation to avoid the problem of running out of virtual memory.