3. Planning Your Replication Scheme : Application Design Issues : Derived Information : Example: Risks of Replicating Derived Data
 
Share this page                  
Example: Risks of Replicating Derived Data
The following scenario shows the problems of using derived data to obtain bank balances in a replicated environment. Assume there is a banking application with three replicated database copies. There is one database copy per branch, designated respectively as Databases A, B, and C.
On Monday afternoon, a customer opens an account at Branch A and deposits $100.
As of Tuesday morning, the customer’s account balance is reflected correctly in Databases A, B, and C, which show a balance of $100.
A
B
C
$100
$100
$100
On Tuesday at 9:00 am, the customer visits Branch C. At this time data communications is disrupted between Branch C and the other branches. The customer withdraws $50 from the account.
The local database at Branch C commits the transaction at 9:05 am, showing a balance of $50. The account balance is queued for distribution to Branches A and B.
A
B
C
$100
$100
$50
The customer drives across town to Branch A. There, the customer withdraws another $50 from the account.
Due to the communications problem, the account balance at Branch C has not yet arrived at Branch A. The local database at Branch A commits the transaction at 9:20 am. The Branch A database shows a balance of $50 in the customer’s account. The account balance at Branch A is queued for distribution to Branches B and C. Branch B receives the update from Branch A and changes the customer’s balance to $50.
A
B
C
$50
$50
$50
The result of these steps is that all three branches end up showing that the customer has a balance of $50 in the account. However, the true balance must be zero.
Assume now that the communication problem at Branch C is repaired on Tuesday at 9:30 am. The Replicator Server at Branch C detects a collision when it attempts to update the customer’s balance at Branch A, and again at Branch B. Similarly, the Replicator Server at Branch A detects a collision at Branch C.
Each Replicator Server’s response to collision is dependent on the way the CDDS containing the balance information is configured. The collision can be resolved in one of the following ways:
Branches A and C abandon their attempts to distribute changes. The customer’s balance at Branches A and B remains $50 with a timestamp of 9:20 am. The customer’s balance at C remains $50 with a timestamp of 9:05 am. Data at all three sites is incorrect and inconsistent.
Branches A and C distribute their changes. The customer’s balance at both Branches A and B is $50 with a timestamp of 9:05 am. The customer’s balance at Branch C is $50 with a timestamp of 9:20 am. Data at all three sites is incorrect and inconsistent.
Branch A’s change which has a later timestamp prevails over Branch C’s. The customer’s balance is $50 with a timestamp of 9:20 am at all three branches. Data is incorrect, but consistent.
The problem in these resolution scenarios is that information is lost (or at best temporarily misplaced if we assume that the archive tables or log files contain the missing information). Even though the customer performed two transactions, the collision handling permitted the recording of only one transaction. The aggregate result of the two transactions is therefore overlooked. A possible solution to this problem is shown in Example 2.