Derived Information
Derived information is information that is derived or calculated from other information in the database. In relational terms, this redundant information makes the table it is in de-normalized. For example, a bank balance is information that can be derived. Instead of adding up all the deposits and withdrawals from the time the account was opened, a bank can maintain a balance in the account record and manipulate the balance every time a transaction occurs on the account. Although derived data can be an efficient way to manage computing resources in a non-replicated environment, you must be aware of the risks of using derived information in a replicated environment.
Example: Risks of Replicating Derived Data
The following scenario shows the problems of using derived data to obtain bank balances in a replicated environment. Assume there is a banking application with three replicated database copies. There is one database copy per branch, designated respectively as Databases A, B, and C.
On Monday afternoon, a customer opens an account at Branch A and deposits $100.
As of Tuesday morning, the customer’s account balance is reflected correctly in Databases A, B, and C, which show a balance of $100.
On Tuesday at 9:00 am, the customer visits Branch C. At this time data communications is disrupted between Branch C and the other branches. The customer withdraws $50 from the account.
The local database at Branch C commits the transaction at 9:05 am, showing a balance of $50. The account balance is queued for distribution to Branches A and B.
The customer drives across town to Branch A. There, the customer withdraws another $50 from the account.
Due to the communications problem, the account balance at Branch C has not yet arrived at Branch A. The local database at Branch A commits the transaction at 9:20 am. The Branch A database shows a balance of $50 in the customer’s account. The account balance at Branch A is queued for distribution to Branches B and C. Branch B receives the update from Branch A and changes the customer’s balance to $50.
The result of these steps is that all three branches end up showing that the customer has a balance of $50 in the account. However, the true balance must be zero.
Assume now that the communication problem at Branch C is repaired on Tuesday at 9:30 am. The Replicator Server at Branch C detects a collision when it attempts to update the customer’s balance at Branch A, and again at Branch B. Similarly, the Replicator Server at Branch A detects a collision at Branch C.
Each Replicator Server’s response to collision is dependent on the way the CDDS containing the balance information is configured. The collision can be resolved in one of the following ways:
• Branches A and C abandon their attempts to distribute changes. The customer’s balance at Branches A and B remains $50 with a timestamp of 9:20 am. The customer’s balance at C remains $50 with a timestamp of 9:05 am. Data at all three sites is incorrect and inconsistent.
• Branches A and C distribute their changes. The customer’s balance at both Branches A and B is $50 with a timestamp of 9:05 am. The customer’s balance at Branch C is $50 with a timestamp of 9:20 am. Data at all three sites is incorrect and inconsistent.
• Branch A’s change which has a later timestamp prevails over Branch C’s. The customer’s balance is $50 with a timestamp of 9:20 am at all three branches. Data is incorrect, but consistent.
The problem in these resolution scenarios is that information is lost (or at best temporarily misplaced if we assume that the archive tables or log files contain the missing information). Even though the customer performed two transactions, the collision handling permitted the recording of only one transaction. The aggregate result of the two transactions is therefore overlooked. A possible solution to this problem is shown in Example 2.
Example: Avoiding Risks in Replicating Derived Data
The potential risk of replicating derived data, as shown in the previous example, can be avoided if each branch maintains its own balance on its own local database, as shown in the following example.
The customer withdraws $50 from the account at Branch C. Data communications are disrupted to all other branches.
Branch C queues the $50 transaction for replication to Branches A and B.
The customer withdraws $50 from Branch A. Branch A queues the $50 transaction for replication to Branches B and C. The transaction from A is replicated to B.
The customer’s balance is now $50 in both Branches A and B. When the network comes up for Branch C, Branch A and B are informed of the $50 withdrawal that the customer made at Branch C. Branches A and B can update their balances. Branch A informs Branch C about the customer’s other $50 withdrawal transaction. All branches show the correct balance of $0.
In this example, the transactions are replicated in the form of deposits and withdrawals. Because the balance is not being replicated, the replication does not collide on the balance table.
Last modified date: 08/14/2024