T.R | Title | User | Personal Name | Date | Lines |
---|
205.1 | True, but the new field test kit addresses this | BROKE::PROTEAU | Jean-Claude Proteau | Fri Apr 12 1996 14:02 | 51 |
|
Massimo,
Let me make a guess about what the customer wanted to do.
DDAL has two basic methods of transfer, called extraction and
replication. Using the extraction method, more than one transfer can
execute at the same time, that is, in parallel. It is the replication
method that your customer must be talking about.
The replication method works with an Oracle Rdb source database and can
transfer data to a variety of target database types. The first time a
replication transfer executes, two steps are performed. The first is
the storing of transfer definition information in special Rdb system
tables in the source database. The second is the transfer of all the
data currently in the souce tables (or some subset thereof as specified
by the customer in the CREATE TRANSFER statement).
The first execution of a replication transfer is called an initial
replication transfer. The second, third, etc. execution is called an
update transfer. Update transfers behave differently from initial
replication transfers. During an update transfer, we don't transfer
all the data in a table, only things which had changed since the
previous transfer. It is this capability that makes the replication
method attractive to Rdb customers, because the amount of data to
transfer is considerably less.
Now to get to the point. Using DDAL version 6.0 or some earlier
version, it is true that initial replication transfers that use the
same source database do not execute in parallel. The storing of the
transfer definition in the source database require exclusive access to
that database. As a result, if you start several initial replication
transfers on the same source database at the same time, one will
execute and finish execution before another begins. It is this
behavior to which your customer is referring, I presume.
There is some good news, though. I have been working on the follow-on
release to version 6.0 of DDAL. I've added a number of performance
options in DDAL to speed up product execution. One of these options
allows initial replication transfers to execute in parallel during the
data transfer operation. The storing of the transfer definition still
has to be done serially; but that typically takes a few seconds to
perform. Shortly, we expect to announce the availability, through Oracle,
of the field test (beta) kits which contain these changes.
If my description of the problem does not match what your customer was
seeing, please ask for some details so that I'll know how to respond\.
Regards,
Claude Proteau
|
205.2 | news | ITVMS1::MRESNATI | | Fri Apr 12 1996 15:57 | 19 |
| Jean Luc,
thanks for your quickly and precious answer.
In the meantime I've received any news infos about the customer problem.
Ivo Tota (Digital Turin but since next week Oracle Turin) had opened
recently (last today) a lot of calls about Data Distribution problems.
Last problem is a crash of DDAL monitor process.
I just speak with DEC excalation manager (Bellotti) and she say me that
this call is in escalation.
Can you help me to had any news about this problems???
Actually, are you DEC or Oracle???
Thanks very much for your kindly cooperation.
/Massimo Resnati.
|
205.3 | Just looking at the problem report now | BROKE::PROTEAU | Jean-Claude Proteau | Fri Apr 12 1996 19:44 | 15 |
|
Massimo,
I received the IPMT problem report a few hours ago and am only starting
to read it now. I plan to reply through the IPMT channel, assuming
that that mechanism still works. If not, I will send mail directly to
Ivo Toto if possible. Is it possible to send mail to Ivo via the
Internet? If so, please post a reply with his address. If not, does
he have an account on the Digital engineering net? If so, I can send
mail to him there. Again, please post gis E-mail address if you know
it.
Regards,
Claude Proteau
|
205.4 | news... | ITVMS1::MRESNATI | | Mon Apr 15 1996 07:41 | 33 |
| Jean Luc,
many thanks for your quickly reply.
>> I received the IPMT....
YES.
Friday, about at 17.00 pm (Italian time) I spoke with Silvana Bellotti
(Digital Exception Manager) and we agree to use yet the IPMT channel
(if possible).
>> If not, I will send mail directly to Ivo....
I'm not sure, but I suppose that for this week Ivo is yet a Digital
employee and only the next week he'll come in Oracle and I don't know
what he'll do in Oracle (Sales, Sales Consultant and so on) .
So I'm not sure if he can receive or read mail.....
For this reason I think that the best choice is to continue to
communicate via IPMT (if so is possible).
Jean Luc ----> IPMT ---->Silvana (DEC) -----> Massimo
If is not possible to communicate via IPMT you can send me a mail at this
Internet address: mresnati@it.oracle.com or supporto@it.oracle.com ( I
prefer the first address) and I try to manage the problem.
In the meantine I'll try to contact Ivo.
Thanks very much for your help and sorry for the trouble.
/Massimo (Oracle Rdb Support - Milan)
|
205.5 | news... | ITVMS1::MRESNATI | | Mon Apr 15 1996 08:17 | 13 |
| Jean Luc,
I've any news about the trouble.
I contact Ivo and today is the last day in DEC.
I confirm you (also speaking with Ivo) that actually the best thing is to
use IPMT (Ivo today can yet read it) if it's possible, if not you can
send me a mail and I report it to Ivo.
As soon as Ivo will have a Oracle mail I reply you it.
Regards.
/Massimo
|
205.6 | Status of Deadlock Problem | BROKE::PROTEAU | Jean-Claude Proteau | Mon Apr 15 1996 17:34 | 43 |
|
Massimo,
I have to reply to you since we currently have no way to enter replies into
the Digital IPMT system. We're trying to find out how the Rdb engineering
team has handled this this past year. As you can see, we are in a transition
phase. Therefore, please relay my comments and questions to Ivo Tota. Once
Ivo becomes an Oracle employee and gets an Oracle Office account, he and I
will be able to send mail to each other directly. My mail address on the
internet is jproteau@us.oracle.com, and within Oracle Office it is
jproteau.us.oracle.com.
The problem he reported to us was for Cassa di Risparmio in Torino. A nice
city, Torino. I was there some years ago. Anyway, the customer experienced
a problem using our product, DEC Data Distributor, in a 2 machine AXP cluster.
With Data Distributor monitors running on each machine, the customer tried to
create schedule definitions at the same time on each machine. This caused
one of the Data Distributor monitors to crash because of a resource deadlock.
We intend to try to reproduce this problem, but not right away. Our current
highest priority is to get Data Distributor 6.0 ready for a general release
as an Oracle product. If we don't do this, Oracle cannot sell the Data Distrib-
utor kits to anyone. This activity might take as long as a week and then we
should be done. After that I should have time to research the problem for
Ivo's customer.
It is difficult for me to make a judgment call on the severity of the problem
as it relates to this customer. On the surface, the problem can be avoided
by simply having people not create transfer schedule definitions at the same
time. However, I don't know in practice how simple that would be to put into
effect. I also don't know from the problem report how often this problem
has arisen at the customer site. If it is an occasional problem, that is one
thing. If it happens so often as to positively disrupt customer production
operation, that is another matter.
If you or Ivo wish to appeal my decision to defer work on this problem for the
moment, please consult with my manager, Steve Serra, who can be reached by
Internet mail at sserra@us.oracle.com.
Regards,
Claude Proteau
|
205.7 | Deadlock Problem | itvms1.it.oracle.com::MRESNATI | | Wed Apr 17 1996 09:42 | 18 |
| Jean-Claude,
I'm sorry for later but I was out to a customer and I came back
yesterday afternoon.
Yesterday evening I spoke with Ivo that was in Oracle office (first day) but
he had still not a Oracle Mail/Office account and he said me to send a mail to
sserra.us.oracle.com to receive a v6.0 Beta Test.
Today Ivo is in Cassa Risparmio Torino (CRT).
Thanks for cooperation.
/Massimo
>> A nice city, Torino
Yes, is a nice city and a lot of RDB engineering spend a lot of time here!!
!! (P.Vigier,P.Grice,L.Carpenter,A.Godfrind....) :-) :-)
|
205.8 | test | itvms1.it.oracle.com::MRESNATI | | Thu Apr 18 1996 09:05 | 13 |
| Jean Luc,
sorry for the trouble, but I just spoke with Ivo and he would like to do
a test in CRT and he would like to know if it's possible.
He wants to start a monitor process on every cluster node (2) and he wants
that every monitor process uses a different DDAL$TR_DB database.
He wants to copy DDAL$TR_DB in the specific directory to avoid deadlock
conflict.
Do you think that it's possible?????
Thanks in advance.
/Massimo
|
205.9 | It's possible | BROKE::PROTEAU | Jean-Claude Proteau | Thu Apr 18 1996 11:22 | 23 |
| Massimo,
What Ivo suggests will probably work. We did not design Data
Distributor with that scenario in mind. After all, if one of your
machines goes down, the transfers then cannot be executed from one of
the remaining machines in your cluster. However, if that is less
important then avoiding the deadlock problem, that's the customer's
choice.
They should also note that stopping the transfer monitor, using the
DDAL$STOP_TR_MON.COM procedure, will only stop the monitor on the
local machine, not all monitors in the cluster. That is a consequence
of using more than one transfer database.
Each transfer database should have a set of transfer definitions for
its own machine. Do not duplicate the definitions. You obviously
don't want two machines executing the same transfer at the same time.
Well, that's all I can think of for the moment.
Regards,
Claude
|
205.10 | ok | itvms1.it.oracle.com::MRESNATI | | Thu Apr 18 1996 13:30 | 9 |
| Claude,
I report these information to Ivo.
Thanks very much for your kindly cooperation,
regards
/Massimo
|
205.11 | I'm Oracle, too. | itvms1.it.oracle.com::ITOTA | | Fri Apr 19 1996 15:53 | 69 |
|
Hi Claude,
thanks very much for time you and Massimo spent during this time.
Now I'm working for Oracle.
About problems I experienced in Cassa di Risparmio this is the
actual situation:
1) monitor crash problem :
we solved ( I hope, we tested it , anyway ) the crash problem
using 2 ddal$tr_db ( one per ddal monitor process)
I suggest it will be checked and possibly solved for the
next release ( if you need to define a lot of transfer in a
cluster environment it may happen )
2) initial replication transfer problem
May the FT kit is available anywhere?
Customer would test it to decide some actions for the next
future.
Another question , is ddal$max_copy limit 40 also in the next release ?
One question more:
is there a customer anywhere that need to replicate
his datas on 400 or more sites ( as Cassa di Risparmio need )?
Actually, to solve customer problems ( a lot of sites, one
source db and slow lines to send data ) , we decided ( and
created ) this configuration:
1 source db
40 first level dbs ( on the same AXP as source is, so very quick )
400 second level dbs ( on the customer sites )
In this way anyone of 40 first level db will distribute data
on 10 second level dbs.
In this way , during initial replication phase, we succeed min
obtaining a formal 40 copy processes working in parallel
( with niether lock nor deadlock problem ).
I think it's the only way customer can start with 400 in
the next future.
Is it the right way, in your opinion?
Anyway, I would like to know if a deadlock problem on the
the source database ( rdb$changes table )
will be solved in the next release
( reference, my note 194.4,5,6,7 )
Just to say the right think to the customer, that now can't start
more than one transfer to the same target db at the same time.
Thanks a lot for your patience,
Ivo
|
205.12 | Some answers, some questions | BROKE::PROTEAU | Jean-Claude Proteau | Sat Apr 20 1996 03:13 | 78 |
| > 1) monitor crash problem :
>
> we solved ( I hope, we tested it , anyway ) the crash problem
> using 2 ddal$tr_db ( one per ddal monitor process)
>
> I suggest it will be checked and possibly solved for the
> next release ( if you need to define a lot of transfer in a
> cluster environment it may happen )
Do you have any idea how easy the problem is to reproduce and how often the
customer encountered it?
> 2) initial replication transfer problem
>
> May the FT kit is available anywhere?
> Customer would test it to decide some actions for the next
> future.
Contact my manager, Steve Serra, about obtaining a field test (beta) kit.
Steve can be reasched on the Internet at sserra@us.oracle.com.
> Another question , is ddal$max_copy limit 40 also in the next release ?
No. I just checked our code and the current limit is still 20. Did one of
us make some comment somewhere that it was changing to 40? We might consider
such a change if there were a demonstrated need for it.
> One question more:
>
> is there a customer anywhere that need to replicate
> his datas on 400 or more sites ( as Cassa di Risparmio need )?
Yes, Belgian Railways for one. I think they replicate to 600 sites.
> Actually, to solve customer problems ( a lot of sites, one
> source db and slow lines to send data ) , we decided ( and
> created ) this configuration:
>
> 1 source db
>
> 40 first level dbs ( on the same AXP as source is, so very quick )
>
> 400 second level dbs ( on the customer sites )
>
> In this way anyone of 40 first level db will distribute data
> on 10 second level dbs.
>
>
> In this way , during initial replication phase, we succeed min
> obtaining a formal 40 copy processes working in parallel
> ( with niether lock nor deadlock problem ).
>
> I think it's the only way customer can start with 400 in
> the next future.
> Is it the right way, in your opinion?
Having 40 transfers running concurrently seems a bit too much for a single
processor and database, but I'm only guessing. I have not personally
performed parallel replication tests with that many transfers. I don't
know if the system will be able to handle it. first, of course, you'll
need the beta kit with the changes to allow parallel operation. Then
you should test 20 transfers and check the utilization of system resources:
cpu cycles and disk I/O to see if you are coming close to saturation.
There are, also, shared system resources in VMS which might also become a
bottleneck. If 20 seem to work well and there appears to be room to grow,
we can talk about raising the built-in limit to a higher value.
> Anyway, I would like to know if a deadlock problem on the
> the source database ( rdb$changes table )
> will be solved in the next release
> ( reference, my note 194.4,5,6,7 )
I re-read the notes. My suggestion was that you contact my manager. I also
asked about deferred snapshots on the source database. Does the customer
use that option? Also, I was confused why you were executing two transfers
to the same target database. I didn't think that that was your intent?
Claude
|
205.13 | Other details | itvms1.it.oracle.com::ITOTA | | Mon Apr 22 1996 14:28 | 78 |
| >Do you have any idea how easy the problem is to reproduce and how often the
>customer encountered it?
I simply wrote 2 command procedures to create/schedule , for instance 5
transfers each.
Running at the same time these procedures ( the first by AXP1, the
second by AXP2 , the two cluster components ) problem had been easily
reproduced.
Of course, you may have two different target databases, because deadlock
problem is on DDAL$TR_DB database.
>No. I just checked our code and the current limit is still 20. Did one of
>us make some comment somewhere that it was changing to 40? We might consider
>such a change if there were a demonstrated need for it.
I did a mistake. On customer environment we've two AXPs, so for the
customer, limit is actually 40.
Anyway, I think that 20 maybe not very much.
Consider, of course, our case: lines are very slow and, during some
initial replication transfers, copy processes are in lef state , waiting
for data arrival on the target database.
>Contact my manager, Steve Serra, about obtaining a field test (beta) kit.
>Steve can be reasched on the Internet at sserra@us.oracle.com.
Last week our collegue M. Resnati sent a mail to your manager to obtain
this kit. At this moment he received no mail from S.Serra.
>Yes, Belgian Railways for one. I think they replicate to 600 sites.
Does this customer use 1 source - 600 target dbs?
Or better, have you other infos about customer data distributor
configuration?
>Having 40 transfers running concurrently seems a bit too much for a single
>processor and database, but I'm only guessing. I have not personally
>performed parallel replication tests with that many transfers. I don't
>know if the system will be able to handle it. first, of course, you'll
>need the beta kit with the changes to allow parallel operation. Then
>you should test 20 transfers and check the utilization of system resources:
>cpu cycles and disk I/O to see if you are coming close to saturation.
>There are, also, shared system resources in VMS which might also become a
>bottleneck. If 20 seem to work well and there appears to be room to grow,
>we can talk about raising the built-in limit to a higher value.
On customer environment there is a two AXPs cluster, so 20 processes each.
As I said before lines are very slow so there are no system bottleneck
using 20 process in parallel.
I think an higher built-in limit should be better ( for some reason:
new cpus more powerful, lines that maybe slow and so on ) with the
way to resize it if needed.
>I re-read the notes. My suggestion was that you contact my manager. I also
>asked about deferred snapshots on the source database. Does the customer
>use that option? Also, I was confused why you were executing two transfers
>to the same target database. I didn't think that that was your intent?
No deferred snapshot had been set on the source database.
The reason why customer created two transfer to the same target database
is this: if you need to alter a table related to a transfer you need to
drop/create this transfer; now, defining more than one transfer it's
possible to avoid the initial replication phase for some tables if you need
to alter a table.
The incorrect thing had been to start these transfers at the same time.
O.K., I'm agree, but from the customer point of view it should be possible
to define/start more than one transfer to the same target database at the
same time.
Anyway I'd like to know if there's something new with the next release
about this problem.
Thanks very much,
Ivo
|
205.14 | It's available in the current T7.0-3 beta kit | BROKE::PROTEAU | Jean-Claude Proteau | Fri Apr 26 1996 15:36 | 9 |
| re: .-1
Ivo,
The new field test (beta) release, which is being called version
T7.0-3, has some changes in it to support parallelism of replication
transfers from the same source database.
Claude
|