[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::dec_data_distributor

Title:	The Replication Option for Rdb
Notice:	Product renamed to Replication Option for Rdb
Moderator:	BROKE::PROTEAU

Created:	Wed Mar 02 1994
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	287
Total number of notes:	1231

205.0. "DDAL and parallel transfer...." by ITVMS1::MRESNATI () Fri Apr 12 1996 10:38

Hi,

a customer asks me a question/problem about DDAL but I've not manuals 
(now I'm in Oracle) and skills to answer.

The questions:

the customer said to the Oracle salesman that DDAL don't distribute data
in parallel.
He had a big center with a lot of filials and he would like to distribute 
data in parallel mode (at the same time) on all the filials but seems that 
DDAL transfers data one by one.
Is it a  normal functionality ?????
Where can I found any documentation on Oracle net???
Stars DB has DDAL articles????

Sorry for the trouble an thanks in advance for your cooperation.

/Massimo

T.R	Title	User	Personal Name	Date	Lines
205.1	True, but the new field test kit addresses this	BROKE::PROTEAU	Jean-Claude Proteau	`Fri Apr 12 1996 14:02`	51
	Massimo, Let me make a guess about what the customer wanted to do. DDAL has two basic methods of transfer, called extraction and replication. Using the extraction method, more than one transfer can execute at the same time, that is, in parallel. It is the replication method that your customer must be talking about. The replication method works with an Oracle Rdb source database and can transfer data to a variety of target database types. The first time a replication transfer executes, two steps are performed. The first is the storing of transfer definition information in special Rdb system tables in the source database. The second is the transfer of all the data currently in the souce tables (or some subset thereof as specified by the customer in the CREATE TRANSFER statement). The first execution of a replication transfer is called an initial replication transfer. The second, third, etc. execution is called an update transfer. Update transfers behave differently from initial replication transfers. During an update transfer, we don't transfer all the data in a table, only things which had changed since the previous transfer. It is this capability that makes the replication method attractive to Rdb customers, because the amount of data to transfer is considerably less. Now to get to the point. Using DDAL version 6.0 or some earlier version, it is true that initial replication transfers that use the same source database do not execute in parallel. The storing of the transfer definition in the source database require exclusive access to that database. As a result, if you start several initial replication transfers on the same source database at the same time, one will execute and finish execution before another begins. It is this behavior to which your customer is referring, I presume. There is some good news, though. I have been working on the follow-on release to version 6.0 of DDAL. I've added a number of performance options in DDAL to speed up product execution. One of these options allows initial replication transfers to execute in parallel during the data transfer operation. The storing of the transfer definition still has to be done serially; but that typically takes a few seconds to perform. Shortly, we expect to announce the availability, through Oracle, of the field test (beta) kits which contain these changes. If my description of the problem does not match what your customer was seeing, please ask for some details so that I'll know how to respond\. Regards, Claude Proteau
205.2	news	ITVMS1::MRESNATI		`Fri Apr 12 1996 15:57`	19
	Jean Luc, thanks for your quickly and precious answer. In the meantime I've received any news infos about the customer problem. Ivo Tota (Digital Turin but since next week Oracle Turin) had opened recently (last today) a lot of calls about Data Distribution problems. Last problem is a crash of DDAL monitor process. I just speak with DEC excalation manager (Bellotti) and she say me that this call is in escalation. Can you help me to had any news about this problems??? Actually, are you DEC or Oracle??? Thanks very much for your kindly cooperation. /Massimo Resnati.
205.3	Just looking at the problem report now	BROKE::PROTEAU	Jean-Claude Proteau	`Fri Apr 12 1996 19:44`	15
	Massimo, I received the IPMT problem report a few hours ago and am only starting to read it now. I plan to reply through the IPMT channel, assuming that that mechanism still works. If not, I will send mail directly to Ivo Toto if possible. Is it possible to send mail to Ivo via the Internet? If so, please post a reply with his address. If not, does he have an account on the Digital engineering net? If so, I can send mail to him there. Again, please post gis E-mail address if you know it. Regards, Claude Proteau
205.4	news...	ITVMS1::MRESNATI		`Mon Apr 15 1996 07:41`	33
	Jean Luc, many thanks for your quickly reply. >> I received the IPMT.... YES. Friday, about at 17.00 pm (Italian time) I spoke with Silvana Bellotti (Digital Exception Manager) and we agree to use yet the IPMT channel (if possible). >> If not, I will send mail directly to Ivo.... I'm not sure, but I suppose that for this week Ivo is yet a Digital employee and only the next week he'll come in Oracle and I don't know what he'll do in Oracle (Sales, Sales Consultant and so on) . So I'm not sure if he can receive or read mail..... For this reason I think that the best choice is to continue to communicate via IPMT (if so is possible). Jean Luc ----> IPMT ---->Silvana (DEC) -----> Massimo If is not possible to communicate via IPMT you can send me a mail at this Internet address: mresnati@it.oracle.com or supporto@it.oracle.com ( I prefer the first address) and I try to manage the problem. In the meantine I'll try to contact Ivo. Thanks very much for your help and sorry for the trouble. /Massimo (Oracle Rdb Support - Milan)
205.5	news...	ITVMS1::MRESNATI		`Mon Apr 15 1996 08:17`	13
	Jean Luc, I've any news about the trouble. I contact Ivo and today is the last day in DEC. I confirm you (also speaking with Ivo) that actually the best thing is to use IPMT (Ivo today can yet read it) if it's possible, if not you can send me a mail and I report it to Ivo. As soon as Ivo will have a Oracle mail I reply you it. Regards. /Massimo
205.6	Status of Deadlock Problem	BROKE::PROTEAU	Jean-Claude Proteau	`Mon Apr 15 1996 17:34`	43
	Massimo, I have to reply to you since we currently have no way to enter replies into the Digital IPMT system. We're trying to find out how the Rdb engineering team has handled this this past year. As you can see, we are in a transition phase. Therefore, please relay my comments and questions to Ivo Tota. Once Ivo becomes an Oracle employee and gets an Oracle Office account, he and I will be able to send mail to each other directly. My mail address on the internet is jproteau@us.oracle.com, and within Oracle Office it is jproteau.us.oracle.com. The problem he reported to us was for Cassa di Risparmio in Torino. A nice city, Torino. I was there some years ago. Anyway, the customer experienced a problem using our product, DEC Data Distributor, in a 2 machine AXP cluster. With Data Distributor monitors running on each machine, the customer tried to create schedule definitions at the same time on each machine. This caused one of the Data Distributor monitors to crash because of a resource deadlock. We intend to try to reproduce this problem, but not right away. Our current highest priority is to get Data Distributor 6.0 ready for a general release as an Oracle product. If we don't do this, Oracle cannot sell the Data Distrib- utor kits to anyone. This activity might take as long as a week and then we should be done. After that I should have time to research the problem for Ivo's customer. It is difficult for me to make a judgment call on the severity of the problem as it relates to this customer. On the surface, the problem can be avoided by simply having people not create transfer schedule definitions at the same time. However, I don't know in practice how simple that would be to put into effect. I also don't know from the problem report how often this problem has arisen at the customer site. If it is an occasional problem, that is one thing. If it happens so often as to positively disrupt customer production operation, that is another matter. If you or Ivo wish to appeal my decision to defer work on this problem for the moment, please consult with my manager, Steve Serra, who can be reached by Internet mail at sserra@us.oracle.com. Regards, Claude Proteau
205.7	Deadlock Problem	itvms1.it.oracle.com::MRESNATI		`Wed Apr 17 1996 09:42`	18
	Jean-Claude, I'm sorry for later but I was out to a customer and I came back yesterday afternoon. Yesterday evening I spoke with Ivo that was in Oracle office (first day) but he had still not a Oracle Mail/Office account and he said me to send a mail to sserra.us.oracle.com to receive a v6.0 Beta Test. Today Ivo is in Cassa Risparmio Torino (CRT). Thanks for cooperation. /Massimo >> A nice city, Torino Yes, is a nice city and a lot of RDB engineering spend a lot of time here!! !! (P.Vigier,P.Grice,L.Carpenter,A.Godfrind....) :-) :-)
205.8	test	itvms1.it.oracle.com::MRESNATI		`Thu Apr 18 1996 09:05`	13
	Jean Luc, sorry for the trouble, but I just spoke with Ivo and he would like to do a test in CRT and he would like to know if it's possible. He wants to start a monitor process on every cluster node (2) and he wants that every monitor process uses a different DDAL$TR_DB database. He wants to copy DDAL$TR_DB in the specific directory to avoid deadlock conflict. Do you think that it's possible????? Thanks in advance. /Massimo
205.9	It's possible	BROKE::PROTEAU	Jean-Claude Proteau	`Thu Apr 18 1996 11:22`	23
	Massimo, What Ivo suggests will probably work. We did not design Data Distributor with that scenario in mind. After all, if one of your machines goes down, the transfers then cannot be executed from one of the remaining machines in your cluster. However, if that is less important then avoiding the deadlock problem, that's the customer's choice. They should also note that stopping the transfer monitor, using the DDAL$STOP_TR_MON.COM procedure, will only stop the monitor on the local machine, not all monitors in the cluster. That is a consequence of using more than one transfer database. Each transfer database should have a set of transfer definitions for its own machine. Do not duplicate the definitions. You obviously don't want two machines executing the same transfer at the same time. Well, that's all I can think of for the moment. Regards, Claude
205.10	ok	itvms1.it.oracle.com::MRESNATI		`Thu Apr 18 1996 13:30`	9
	Claude, I report these information to Ivo. Thanks very much for your kindly cooperation, regards /Massimo
205.11	I'm Oracle, too.	itvms1.it.oracle.com::ITOTA		`Fri Apr 19 1996 15:53`	69
	Hi Claude, thanks very much for time you and Massimo spent during this time. Now I'm working for Oracle. About problems I experienced in Cassa di Risparmio this is the actual situation: 1) monitor crash problem : we solved ( I hope, we tested it , anyway ) the crash problem using 2 ddal$tr_db ( one per ddal monitor process) I suggest it will be checked and possibly solved for the next release ( if you need to define a lot of transfer in a cluster environment it may happen ) 2) initial replication transfer problem May the FT kit is available anywhere? Customer would test it to decide some actions for the next future. Another question , is ddal$max_copy limit 40 also in the next release ? One question more: is there a customer anywhere that need to replicate his datas on 400 or more sites ( as Cassa di Risparmio need )? Actually, to solve customer problems ( a lot of sites, one source db and slow lines to send data ) , we decided ( and created ) this configuration: 1 source db 40 first level dbs ( on the same AXP as source is, so very quick ) 400 second level dbs ( on the customer sites ) In this way anyone of 40 first level db will distribute data on 10 second level dbs. In this way , during initial replication phase, we succeed min obtaining a formal 40 copy processes working in parallel ( with niether lock nor deadlock problem ). I think it's the only way customer can start with 400 in the next future. Is it the right way, in your opinion? Anyway, I would like to know if a deadlock problem on the the source database ( rdb$changes table ) will be solved in the next release ( reference, my note 194.4,5,6,7 ) Just to say the right think to the customer, that now can't start more than one transfer to the same target db at the same time. Thanks a lot for your patience, Ivo
205.12	Some answers, some questions	BROKE::PROTEAU	Jean-Claude Proteau	`Sat Apr 20 1996 03:13`	78
	> 1) monitor crash problem : > > we solved ( I hope, we tested it , anyway ) the crash problem > using 2 ddal$tr_db ( one per ddal monitor process) > > I suggest it will be checked and possibly solved for the > next release ( if you need to define a lot of transfer in a > cluster environment it may happen ) Do you have any idea how easy the problem is to reproduce and how often the customer encountered it? > 2) initial replication transfer problem > > May the FT kit is available anywhere? > Customer would test it to decide some actions for the next > future. Contact my manager, Steve Serra, about obtaining a field test (beta) kit. Steve can be reasched on the Internet at sserra@us.oracle.com. > Another question , is ddal$max_copy limit 40 also in the next release ? No. I just checked our code and the current limit is still 20. Did one of us make some comment somewhere that it was changing to 40? We might consider such a change if there were a demonstrated need for it. > One question more: > > is there a customer anywhere that need to replicate > his datas on 400 or more sites ( as Cassa di Risparmio need )? Yes, Belgian Railways for one. I think they replicate to 600 sites. > Actually, to solve customer problems ( a lot of sites, one > source db and slow lines to send data ) , we decided ( and > created ) this configuration: > > 1 source db > > 40 first level dbs ( on the same AXP as source is, so very quick ) > > 400 second level dbs ( on the customer sites ) > > In this way anyone of 40 first level db will distribute data > on 10 second level dbs. > > > In this way , during initial replication phase, we succeed min > obtaining a formal 40 copy processes working in parallel > ( with niether lock nor deadlock problem ). > > I think it's the only way customer can start with 400 in > the next future. > Is it the right way, in your opinion? Having 40 transfers running concurrently seems a bit too much for a single processor and database, but I'm only guessing. I have not personally performed parallel replication tests with that many transfers. I don't know if the system will be able to handle it. first, of course, you'll need the beta kit with the changes to allow parallel operation. Then you should test 20 transfers and check the utilization of system resources: cpu cycles and disk I/O to see if you are coming close to saturation. There are, also, shared system resources in VMS which might also become a bottleneck. If 20 seem to work well and there appears to be room to grow, we can talk about raising the built-in limit to a higher value. > Anyway, I would like to know if a deadlock problem on the > the source database ( rdb$changes table ) > will be solved in the next release > ( reference, my note 194.4,5,6,7 ) I re-read the notes. My suggestion was that you contact my manager. I also asked about deferred snapshots on the source database. Does the customer use that option? Also, I was confused why you were executing two transfers to the same target database. I didn't think that that was your intent? Claude
205.13	Other details	itvms1.it.oracle.com::ITOTA		`Mon Apr 22 1996 14:28`	78
	>Do you have any idea how easy the problem is to reproduce and how often the >customer encountered it? I simply wrote 2 command procedures to create/schedule , for instance 5 transfers each. Running at the same time these procedures ( the first by AXP1, the second by AXP2 , the two cluster components ) problem had been easily reproduced. Of course, you may have two different target databases, because deadlock problem is on DDAL$TR_DB database. >No. I just checked our code and the current limit is still 20. Did one of >us make some comment somewhere that it was changing to 40? We might consider >such a change if there were a demonstrated need for it. I did a mistake. On customer environment we've two AXPs, so for the customer, limit is actually 40. Anyway, I think that 20 maybe not very much. Consider, of course, our case: lines are very slow and, during some initial replication transfers, copy processes are in lef state , waiting for data arrival on the target database. >Contact my manager, Steve Serra, about obtaining a field test (beta) kit. >Steve can be reasched on the Internet at sserra@us.oracle.com. Last week our collegue M. Resnati sent a mail to your manager to obtain this kit. At this moment he received no mail from S.Serra. >Yes, Belgian Railways for one. I think they replicate to 600 sites. Does this customer use 1 source - 600 target dbs? Or better, have you other infos about customer data distributor configuration? >Having 40 transfers running concurrently seems a bit too much for a single >processor and database, but I'm only guessing. I have not personally >performed parallel replication tests with that many transfers. I don't >know if the system will be able to handle it. first, of course, you'll >need the beta kit with the changes to allow parallel operation. Then >you should test 20 transfers and check the utilization of system resources: >cpu cycles and disk I/O to see if you are coming close to saturation. >There are, also, shared system resources in VMS which might also become a >bottleneck. If 20 seem to work well and there appears to be room to grow, >we can talk about raising the built-in limit to a higher value. On customer environment there is a two AXPs cluster, so 20 processes each. As I said before lines are very slow so there are no system bottleneck using 20 process in parallel. I think an higher built-in limit should be better ( for some reason: new cpus more powerful, lines that maybe slow and so on ) with the way to resize it if needed. >I re-read the notes. My suggestion was that you contact my manager. I also >asked about deferred snapshots on the source database. Does the customer >use that option? Also, I was confused why you were executing two transfers >to the same target database. I didn't think that that was your intent? No deferred snapshot had been set on the source database. The reason why customer created two transfer to the same target database is this: if you need to alter a table related to a transfer you need to drop/create this transfer; now, defining more than one transfer it's possible to avoid the initial replication phase for some tables if you need to alter a table. The incorrect thing had been to start these transfers at the same time. O.K., I'm agree, but from the customer point of view it should be possible to define/start more than one transfer to the same target database at the same time. Anyway I'd like to know if there's something new with the next release about this problem. Thanks very much, Ivo
205.14	It's available in the current T7.0-3 beta kit	BROKE::PROTEAU	Jean-Claude Proteau	`Fri Apr 26 1996 15:36`	9
	re: .-1 Ivo, The new field test (beta) release, which is being called version T7.0-3, has some changes in it to support parallelism of replication transfers from the same source database. Claude