[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_wish

Title:	Oracle Rdb Wishes and Suggestions
Notice:	Please READ note 1.0 before using WRITE or REPLY
Moderator:	NOVA::SMITHI

Created:	Fri Apr 07 1989
Last Modified:	Mon Jun 02 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	809
Total number of notes:	4111

802.0. "LOAD/UNLOAD by Physical Area" by NOVA::SANTIAGO (I was a teenage net-random.) Wed May 07 1997 15:06

    Ability to load/unload by physical area.
    
    In large (>500GB) environments, it's too timeconsuming to manipulate
    an input stream to correctly affect parallel loading; so if a table
    has strict horizontal partitioning, load the physical areas by
    parallel RMU processes.
    
    For example:
    
    rmu /load root-file-spec table-name /area=physical-area input-file
    
    	and
    
    rmu /unload root-file-spec table-name /area=physical-area output-file
    
    locking of the target area could/should be exclusive on writes, 
    protected on reads; metadata (cardinality) updates should only be at
    the commit interval
    
    /los

T.R	Title	User	Personal Name	Date	Lines
802.1		NOVA::SMITHI	Don't understate or underestimate Rdb!	`Wed May 07 1997 15:24`	4
	Why isn't rmu/load/parallel sufficient? I really don't understand this suggestion Ian
802.2		NOVA::SANTIAGO	I was a teenage net-random.	`Thu May 08 1997 21:22`	30
	In enviroments where a database takes time series data (i.e., a warehouse), it's much more efficient (IMHO) to load the new target area, rather then attempt parallel loading against existing storage. The issue is specific to the case I'm working now, but I'd suspect very general, hence useful. My scenario is to build 3 databased: 1, 2, 6 year where each year is approx 300GB; data is obtained by month, each being 5.5GB. I'd like the ability to add the new area(s) or have them pre-created alter map (strict partitioning) to add new area(s) specifically load new area Most time, loads are yearly, that is I'd want to have 12 streams. The input datasets are by month, so a parallel load only sees 1 executor busy. Multiple parallel instances consume too many resources (i.e., 156 {12 * (1 parent + 12 siblings)} vs. 12 processes) at system and locking processes). Now as far as indexes, either drop them before hand. At current rates, it's about 40 hours to create the 'small' database's indexes. Perhaps a similar wish to create partitioned indexes? This is a simple design in which partition is by a single field, date; When builing a single year, it has proven to be more efficient to partition by other data, but union of several years via dbi still needs yearly partitions hence this partitioning scheme. /los
802.3		NOVA::SMITHI	Don't understate or underestimate Rdb!	`Fri May 09 1997 03:21`	17
	Getting closer... So what should RMU/LOAD be doing? Surely if all the data falls in the one partition only one partition will be loaded. DO you want multiple loaders writing to the same area? Please keep describing what you want to see. ~ Now as far as indexes, either drop them before hand. At current rates, ~ it's about 40 hours to create the 'small' database's indexes. Perhaps ~ a similar wish to create partitioned indexes? Yes we plan to add a parallel index create at some stage. It will be similar to the model used by RMU/LOAD/PARALLEL. In addition some new features in Rdb7 release 2 might be useful also... more on that when I set them in concrete. Ian
802.4	wishes 'A' and 'B'	NOVA::SANTIAGO	I was a teenage net-random.	`Mon May 12 1997 16:13`	57
	In review, I'm really seeking 2 wishes A. Rather than attempting a parallel load, use a single stream load to load a single physical area. B. For parallel loads, allow .PLAN file to name by executor input I'd like to give rmu/load a hint, that is, I want it to load 1 partition of a map where strict partitioning is set. All the data is to be loaded into a single physical area. For example, let's say in my example we have 12 months of data to load, and that partitions are named (for '1996) data_199601 ... data_199612 and a map stating store using date-field in data_199601 with limit of '31-jan-1996' ... in data_199612 with limit of '31-dec-1996' then doing a rmu /load root table /area=data_199601 199601.data /rms=... tells rmu to ignore/error all records not meant for the target area (I might even go so far as to say 'trust me' that input is fine;-) subsequently, start 11 more jobs on their respective months' partition and input stream; multiple loaders never write to the same area, except the root for cardinality if chosen. now the user can do this manually, that is start 12 individual and separate rmu/load processes (assuming wish 'A') -OR- wish 'B' being: 1. allow the .PLAN file to indicate to the loader/executor where its input stream is coming from. 2. have the parent read the input stream(s) per partition/executor (or each executor reads it's input stream independently). That is, the user can indicate the input stream(s) to be read and stored by partition it's hoped that loads by area in this fashion will approach sequential write speeds (i.e., a table with a simple map with no partitioning and to a single area file) and have minimal locking on the physical and logical area. Having 'A' and 'B' allow the widest flexibilty using the same basic capability being to load a physical vs. logical area directly. It would be introduced on simple [1 field] partitioning. A corrally to 'B' is to tell the parallel loaded is being which partitions aren't being loaded, thus saving resources. /los
802.5		NOVA::SMITHI	Don't understate or underestimate Rdb!	`Mon May 12 1997 17:18`	17
	I think what you really want is something like: rmu/load/exclude_partitions=(a,c) or rmu/load/include_partitions=(b) Any data for the exluded partitions would be (a) ignored, or (b) written to an exceptions file. Currently rmu/load establishes an executor for each partition. So this would be a way to firstly limit the number of executors when the data is partition specific [saves creating those idle processes], and secondly allows some data filtering. Did I get it? Ian
802.6	we're almost home!	NOVA::SANTIAGO	I was a teenage net-random.	`Mon May 12 1997 18:07`	48
	almost, what I'd want is rmu/load/include_partitions=(a) ... to NOT conflict with rmu/load/include_partitions=(b) ... and rmu/load/include_partitions=(c) ... this uses the parallel loader, but only 1 executor per job sequence; an alternative is to say (note where the input file is specified) rmu/load /plan=sys$input root .... ! plan file plan name = load_by_file_plan plan type = load plan parameters: database root file = root-name table name = table-name ! input file = input-file fields = (fields-list) transaction_type = protected ! buffers ... ! rest of database wide param end plan pareamters executor parameters: executor name = executor_1 ! place_only ! exception_file = <none> ! RUJ directory = <default> ! communications buffers = 12 input file = input-file OR (input-file, input-file, ...) wildcard? end executor parameters Yes, any data excluded (by excluding partitions) is ignored or written to an exception file; there nit here is allow multiple parallel loaders to coexist when each parallel load jobs is loading a strict set of partitions, where each partition is is loaded by an input file(s). /los
802.7		NOVA::SMITHI	Don't understate or underestimate Rdb!	`Mon May 12 1997 18:59`	20
	Still confused... Why use a parallel loader when there is only one partition? ~ almost, what I'd want is ~ ~ rmu/load/include_partitions=(a) ... ~ ~ to NOT conflict with ~ ~ rmu/load/include_partitions=(b) ... RMU Load already knows how to ready a partition for exclusive access :-) So this would be a given. ~ partitions, where each partition is is loaded by an input file(s). Right, each loader would require its own data file. Ian