[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_wish

Title:Oracle Rdb Wishes and Suggestions
Notice:Please READ note 1.0 before using WRITE or REPLY
Moderator:NOVA::SMITHI
Created:Fri Apr 07 1989
Last Modified:Mon Jun 02 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:809
Total number of notes:4111

802.0. "LOAD/UNLOAD by Physical Area" by NOVA::SANTIAGO (I was a teenage net-random.) Wed May 07 1997 15:06

    Ability to load/unload by physical area.
    
    In large (>500GB) environments, it's too timeconsuming to manipulate
    an input stream to correctly affect parallel loading; so if a table
    has strict horizontal partitioning, load the physical areas by
    parallel RMU processes.
    
    For example:
    
    rmu /load root-file-spec table-name /area=physical-area input-file
    
    	and
    
    rmu /unload root-file-spec table-name /area=physical-area output-file
    
    locking of the target area could/should be exclusive on writes, 
    protected on reads; metadata (cardinality) updates should only be at
    the commit interval
    
    /los
T.RTitleUserPersonal
Name
DateLines
802.1NOVA::SMITHIDon't understate or underestimate Rdb!Wed May 07 1997 15:244
Why isn't rmu/load/parallel sufficient?
I really don't understand this suggestion

Ian
802.2NOVA::SANTIAGOI was a teenage net-random.Thu May 08 1997 21:2230
    In enviroments where a database takes time series data (i.e., a
    warehouse), it's much more efficient (IMHO) to load the new target
    area, rather then attempt parallel loading against existing storage.
    The issue is specific to the case I'm working now, but I'd suspect very
    general, hence useful.
    
    My scenario is to build 3 databased: 1, 2, 6 year where each year is
    approx 300GB; data is obtained by month, each being 5.5GB. I'd
    like the ability to
    
    	add the new area(s) or have them pre-created
    	alter map (strict partitioning) to add new area(s)
    	specifically load new area
    
    Most time, loads are yearly, that is I'd want to have 12 streams. The
    input datasets are by month, so a parallel load only sees 1 executor
    busy.  Multiple parallel instances consume too many resources (i.e., 
    156 {12 * (1 parent + 12 siblings)} vs. 12 processes) at system and 
    locking processes).
    
    Now as far as indexes, either drop them before hand. At current rates,
    it's about 40 hours to create the 'small' database's indexes. Perhaps
    a similar wish to create partitioned indexes?
    
    This is a simple design in which partition is by a single field, date;
    When builing a single year, it has proven to be more efficient to 
    partition by other data, but union of several years via dbi still needs
    yearly partitions hence this partitioning scheme.
    
    /los
802.3NOVA::SMITHIDon't understate or underestimate Rdb!Fri May 09 1997 03:2117
Getting closer...

So what should RMU/LOAD be doing?  Surely if all the data falls in the one
partition only one partition will be loaded.  DO you want multiple loaders
writing to the same area?

Please keep describing what you want to see.

~    Now as far as indexes, either drop them before hand. At current rates,
~    it's about 40 hours to create the 'small' database's indexes. Perhaps
~    a similar wish to create partitioned indexes?

Yes we plan to add a parallel index create at some stage.  It will be similar
to the model used by RMU/LOAD/PARALLEL.  In addition some new features in Rdb7
release 2 might be useful also...  more on that when I set them in concrete.

Ian
802.4wishes 'A' and 'B'NOVA::SANTIAGOI was a teenage net-random.Mon May 12 1997 16:1357
    In review, I'm really seeking 2 wishes
    
    A. Rather than attempting a parallel load, use a single stream load to
       load a single physical area.
    B. For parallel loads, allow .PLAN file to name by executor input
    
    I'd like to give rmu/load a hint, that is, I want it to load 1 partition
    of a map where strict partitioning is set.  All the data is to be
    loaded into a single physical area.  For example, let's say in my
    example we have 12 months of data to load, and that partitions are
    named (for '1996)
    
    	data_199601 ... data_199612
    
    and a map stating
    
    	store using date-field
    
    		in data_199601 with limit of '31-jan-1996'
    		...
    		in data_199612 with limit of '31-dec-1996'
    
    then doing a
    
    	rmu /load root table /area=data_199601 199601.data /rms=...
    
    tells rmu to ignore/error all records not meant for the target area
    (I might even go so far as to say 'trust me' that input is fine;-)
    
    subsequently, start 11 more jobs on their respective months' partition
    and input stream; multiple loaders never write to the same area, except
    the root for cardinality if chosen.
    
    now the user can do this manually, that is start 12 individual and
    separate rmu/load processes (assuming wish 'A') -OR- wish 'B' being:
    
    	1. allow the .PLAN file to indicate to the loader/executor where 
           its input stream is coming from.  
    	2. have the parent read the input stream(s) per partition/executor
    	   (or each executor reads it's input stream independently). That
    	   is, the user can indicate the input stream(s) to be read and 
    	   stored by partition
    
    it's hoped that loads by area in this fashion will approach sequential
    write speeds (i.e., a table with a simple map with no partitioning and
    to a single area file) and have minimal locking on the physical and
    logical area.
    
    Having 'A' and 'B' allow the widest flexibilty using the same basic
    capability being to load a physical vs. logical area directly. It would
    be introduced on simple [1 field] partitioning.
    
    A corrally to 'B' is to tell the parallel loaded is being which 
    partitions aren't being loaded, thus saving resources.
    
    /los
                            
802.5NOVA::SMITHIDon't understate or underestimate Rdb!Mon May 12 1997 17:1817
I think what you really want is something like:

	rmu/load/exclude_partitions=(a,c)
or
	rmu/load/include_partitions=(b)

Any data for the exluded partitions would be (a) ignored, or (b) written to an
exceptions file.

Currently rmu/load establishes an executor for each partition.  So this would
be a way to firstly limit the number of executors when the data is partition
specific [saves creating those idle processes], and secondly allows some data
filtering.

Did I get it?

Ian
802.6we're almost home!NOVA::SANTIAGOI was a teenage net-random.Mon May 12 1997 18:0748
    almost, what I'd want is
    
    	rmu/load/include_partitions=(a) ...
    
    to *NOT* conflict with
    
    	rmu/load/include_partitions=(b) ...
    
    and
    
    	rmu/load/include_partitions=(c) ...
    
    this uses the parallel loader, but only 1 executor per job sequence; an
    alternative is to say (note where the input file is specified)
    
    	rmu/load /plan=sys$input root ....
    
    ! plan file
    plan name = load_by_file_plan
    plan type = load
    plan parameters:
    	database root file = root-name
    	table name = table-name
    	! input file = input-file
    	fields = (fields-list)
    	transaction_type = protected
    	! buffers
    	...
    	! rest of database wide param
    end plan pareamters
    
    executor parameters:
    	executor name = executor_1
    	! place_only
    	! exception_file = <none>
    	! RUJ directory = <default>
    	! communications buffers = 12
    	input file = input-file  *OR* (input-file, input-file, ...) wildcard?
    end executor parameters
    
    Yes, any data excluded (by excluding partitions) is ignored or written
    to an exception file; there nit here is allow multiple parallel loaders
    to coexist when each parallel load jobs is loading a strict set of
    partitions, where each partition is is loaded by an input file(s).
    
    /los
    
    
802.7NOVA::SMITHIDon't understate or underestimate Rdb!Mon May 12 1997 18:5920
Still confused...

Why use a parallel loader when there is only one partition?

~    almost, what I'd want is
~    
~    	rmu/load/include_partitions=(a) ...
~    
~    to *NOT* conflict with
~    
~    	rmu/load/include_partitions=(b) ...

RMU Load already knows how to ready a partition for exclusive access :-)
So this would be a given.

~    partitions, where each partition is is loaded by an input file(s).

Right, each loader would require its own data file.

Ian