5. Data Loading : Incremental Data Load : Moving Window of Data
 
Share this page                  
Moving Window of Data
A moving window of data is very common for fact tables in a data warehouse or data mart. For example: you want to keep the most recent 36 months of data online. Below are three approaches to achieve a moving window data loading strategy with Vector:
1. Append data directly to the fact table in Vector as it arrives--for example, add data daily or multiple times per day. This approach assumes that the fact table is not indexed.
On a monthly basis, create a staging table by selecting the (primary key for the) rows from the fact table for the month you want to purge (CREATE TABLE...AS SELECT...) and then use the MODIFY...TO COMBINE statement to remove the data from the original table.
MODIFY basetable EXCEPT staging TO COMBINE
Then drop the staging table.
2. Append data to staging table STAGING1 rather than the base fact table as it arrives--for example, add data daily or multiple times per day. This approach assumes that staging table STAGING1 is not indexed. Create a view to union (all) the results of staging table STAGING1 and the base fact table.
On a monthly basis, create a second staging table STAGING2 by selecting the (primary key for the) rows from the fact table for the month you want to purge (CREATE TABLE...AS SELECT…) and then use the MODIFY...TO COMBINE statement to add the data in staging table STAGING1 and remove the data from the original table which is in staging table STAGING2.
MODIFY basetable UNION staging1 EXCEPT staging2 TO COMBINE
For this operation the base table may be indexed.
Drop staging table STAGING2.
Then empty the table STAGING1:
MODIFY staging1 TO TRUNCATED
3. Use multiple tables with the same structure to represent all data for the fact table (for example one table per month--36 tables total for 36 months of data).
Create a view to UNION ALL the results of all tables and present the data for all tables as a single fact table view to an application. Add data as it arrives to the most recent month of data. If the table is indexed then consider using MODIFY...TO COMBINE to add the data. On a monthly basis drop or empty (using the MODIFY...TO COMBINE statement) the oldest table.
Ideally, add a clause in the UNION definition that specifies WHERE date_range BETWEEN <first minute in first day of month 1> and <last minute of last day of month 1> for each view being unioned together, to provide information to the optimizer that each view contained only rows in that range. This helps to eliminate views that are not relevant.
Beware that the different approaches may result in different query execution plans that can impact performance of queries against the base table data.