T.R | Title | User | Personal Name | Date | Lines |
---|
802.1 | | NOVA::SMITHI | Don't understate or underestimate Rdb! | Wed May 07 1997 15:24 | 4 |
| Why isn't rmu/load/parallel sufficient?
I really don't understand this suggestion
Ian
|
802.2 | | NOVA::SANTIAGO | I was a teenage net-random. | Thu May 08 1997 21:22 | 30 |
| In enviroments where a database takes time series data (i.e., a
warehouse), it's much more efficient (IMHO) to load the new target
area, rather then attempt parallel loading against existing storage.
The issue is specific to the case I'm working now, but I'd suspect very
general, hence useful.
My scenario is to build 3 databased: 1, 2, 6 year where each year is
approx 300GB; data is obtained by month, each being 5.5GB. I'd
like the ability to
add the new area(s) or have them pre-created
alter map (strict partitioning) to add new area(s)
specifically load new area
Most time, loads are yearly, that is I'd want to have 12 streams. The
input datasets are by month, so a parallel load only sees 1 executor
busy. Multiple parallel instances consume too many resources (i.e.,
156 {12 * (1 parent + 12 siblings)} vs. 12 processes) at system and
locking processes).
Now as far as indexes, either drop them before hand. At current rates,
it's about 40 hours to create the 'small' database's indexes. Perhaps
a similar wish to create partitioned indexes?
This is a simple design in which partition is by a single field, date;
When builing a single year, it has proven to be more efficient to
partition by other data, but union of several years via dbi still needs
yearly partitions hence this partitioning scheme.
/los
|
802.3 | | NOVA::SMITHI | Don't understate or underestimate Rdb! | Fri May 09 1997 03:21 | 17 |
| Getting closer...
So what should RMU/LOAD be doing? Surely if all the data falls in the one
partition only one partition will be loaded. DO you want multiple loaders
writing to the same area?
Please keep describing what you want to see.
~ Now as far as indexes, either drop them before hand. At current rates,
~ it's about 40 hours to create the 'small' database's indexes. Perhaps
~ a similar wish to create partitioned indexes?
Yes we plan to add a parallel index create at some stage. It will be similar
to the model used by RMU/LOAD/PARALLEL. In addition some new features in Rdb7
release 2 might be useful also... more on that when I set them in concrete.
Ian
|
802.4 | wishes 'A' and 'B' | NOVA::SANTIAGO | I was a teenage net-random. | Mon May 12 1997 16:13 | 57 |
| In review, I'm really seeking 2 wishes
A. Rather than attempting a parallel load, use a single stream load to
load a single physical area.
B. For parallel loads, allow .PLAN file to name by executor input
I'd like to give rmu/load a hint, that is, I want it to load 1 partition
of a map where strict partitioning is set. All the data is to be
loaded into a single physical area. For example, let's say in my
example we have 12 months of data to load, and that partitions are
named (for '1996)
data_199601 ... data_199612
and a map stating
store using date-field
in data_199601 with limit of '31-jan-1996'
...
in data_199612 with limit of '31-dec-1996'
then doing a
rmu /load root table /area=data_199601 199601.data /rms=...
tells rmu to ignore/error all records not meant for the target area
(I might even go so far as to say 'trust me' that input is fine;-)
subsequently, start 11 more jobs on their respective months' partition
and input stream; multiple loaders never write to the same area, except
the root for cardinality if chosen.
now the user can do this manually, that is start 12 individual and
separate rmu/load processes (assuming wish 'A') -OR- wish 'B' being:
1. allow the .PLAN file to indicate to the loader/executor where
its input stream is coming from.
2. have the parent read the input stream(s) per partition/executor
(or each executor reads it's input stream independently). That
is, the user can indicate the input stream(s) to be read and
stored by partition
it's hoped that loads by area in this fashion will approach sequential
write speeds (i.e., a table with a simple map with no partitioning and
to a single area file) and have minimal locking on the physical and
logical area.
Having 'A' and 'B' allow the widest flexibilty using the same basic
capability being to load a physical vs. logical area directly. It would
be introduced on simple [1 field] partitioning.
A corrally to 'B' is to tell the parallel loaded is being which
partitions aren't being loaded, thus saving resources.
/los
|
802.5 | | NOVA::SMITHI | Don't understate or underestimate Rdb! | Mon May 12 1997 17:18 | 17 |
| I think what you really want is something like:
rmu/load/exclude_partitions=(a,c)
or
rmu/load/include_partitions=(b)
Any data for the exluded partitions would be (a) ignored, or (b) written to an
exceptions file.
Currently rmu/load establishes an executor for each partition. So this would
be a way to firstly limit the number of executors when the data is partition
specific [saves creating those idle processes], and secondly allows some data
filtering.
Did I get it?
Ian
|
802.6 | we're almost home! | NOVA::SANTIAGO | I was a teenage net-random. | Mon May 12 1997 18:07 | 48 |
| almost, what I'd want is
rmu/load/include_partitions=(a) ...
to *NOT* conflict with
rmu/load/include_partitions=(b) ...
and
rmu/load/include_partitions=(c) ...
this uses the parallel loader, but only 1 executor per job sequence; an
alternative is to say (note where the input file is specified)
rmu/load /plan=sys$input root ....
! plan file
plan name = load_by_file_plan
plan type = load
plan parameters:
database root file = root-name
table name = table-name
! input file = input-file
fields = (fields-list)
transaction_type = protected
! buffers
...
! rest of database wide param
end plan pareamters
executor parameters:
executor name = executor_1
! place_only
! exception_file = <none>
! RUJ directory = <default>
! communications buffers = 12
input file = input-file *OR* (input-file, input-file, ...) wildcard?
end executor parameters
Yes, any data excluded (by excluding partitions) is ignored or written
to an exception file; there nit here is allow multiple parallel loaders
to coexist when each parallel load jobs is loading a strict set of
partitions, where each partition is is loaded by an input file(s).
/los
|
802.7 | | NOVA::SMITHI | Don't understate or underestimate Rdb! | Mon May 12 1997 18:59 | 20 |
| Still confused...
Why use a parallel loader when there is only one partition?
~ almost, what I'd want is
~
~ rmu/load/include_partitions=(a) ...
~
~ to *NOT* conflict with
~
~ rmu/load/include_partitions=(b) ...
RMU Load already knows how to ready a partition for exclusive access :-)
So this would be a given.
~ partitions, where each partition is is loaded by an input file(s).
Right, each loader would require its own data file.
Ian
|