Data accumulation is SiSense's method for building ElastiCube data without completely refreshing a table. This post will describe the data accumulation functionality.
This post will give a user the ability to customize and optimize their ElastiCube build process.
High Level Summary
There are two options for accumulating data: by table or by index. In general, accumulate by table appends all of the source data to the ElastiCube table, and accumulate by index will use an index in an Elasticube table to filter the source data that is migrated.
Accumulate by Table
This option will append all of the data selected for the build onto the current table. No comparisons are made, and no data from the source is omitted. This option can be selected by selecting the additional preferences options for a table in the ElastiCube manager, and selecting 'Accumulate Data'.
If your source data contains rows that have already been loaded, the ElastiCube table will contain duplicate rows. The image below demonstrates a scenario where duplicate rows would be inserted.
For contrast- the following example shows usage of accumulate by table that doesn't generate duplicate rows.
Accumulate By Index
The index is a column in an ElastiCube table that is used to determine whether or not a row from the source data should be inserted. It must have a data type of either an integer or a date.
When you select an integer, only source rows with a value greater than the maximum index value in the ElastiCube table will be inserted. This option will never cause current data in the ElastiCube table to be modified or deleted. The following image demonstrates this logic:
After Load 1, the maximum index value is 3. In Load 2, the source index value of 2 is not inserted (since it is less than 3), but the source index value of 4 is inserted (since it is greater than 3).
When you select a date to accumulate by, there are 2 different options;
- Last Stored Value- This is identical to the integer index functionality. Only source rows where the index is greater than the ElastiCube table’s maximum date index value will be inserted. ElastiCube data is never removed or modified after it has been built.
- Last X Days- This option specifies the number of days preceding the build date to synchronize.
NOTE: Syncing "Since Last X days" requires the source data to be Ordered by the date.
Synchronization consists of deleting all of the rows in the ElastiCube table where the index is in the specified range, and inserting all source rows where the index value is in the specified range. This option does allow for ElastiCube table data to be deleted or modified. The following example will demonstrate this logic using a 3 day range.
In Load 1, the Ecube table started empty, so nothing was deleted. Both of the source rows are between 12/3/2014 and 12/5/2014 (the 3 day range), so they are inserted. In Load 2, all Ecube table records with the date between 12/4/2014 and 12/6/2014 are removed. In the source table, the data for 12/3/2014 and 12/4/2014 has been updated. However, since 12/3/2014 is outside of our 3 day range, the Ecube data for that date has not changed. Since 12/4/2014 is in the range, the original Ecube table record for that day was removed and was replaced by the source table record for that day. 12/5/2014 data is new, so it was also inserted into the ecube table.
With either date accumulation option, you can choose the Keep Only option, and specify how many days’ worth of data you’d like to keep in the ElastiCube table. If you specify 3 days for this option, after all of the other accumulate by index functionality is completed, rows in the ElastiCube table where the index value is more than 3 days old will be deleted. The following example demonstrates this:
Keep Only 3 days was specified, and only the past three days were kept.