When using "Accumulate By" builds (see below), Sisense will only import "new" records based on a selected INT or DATETIME index field - if values in the source DB exceed the maximum value of that index as of the previous ElastiCube build, they will be inserted into the cube.
The logic will be similar to adding a WHERE clause to the SQL Query: "WHERE Source.[Index] > ElastiCube.MAX[Index]"
In some use cases we might need to support updates and deletions - typically for updates a new record is added to the source with the same ID as the initial record, but with a later timestamp.
This article describes how to manage those records in your ElastiCube.
The Accumulation build option can only be applied to the entire table and specific INT or DATETIME field. You can find more information on accumulative build options here.
In order to track the deleted records you need to create the logic on the data source side that will allow identifying those deleted records and differentiating them from their initial entry. Add a new column in the database and whenever the record should be removed - instead of performing DELETE operation - use UPDATE which sets:
- The data columns to NULL
- ID of the record remains the same
- Deleted column = TRUE (or "1")
An initial "Replace All" build is required to import our first set of data. This initial data will be used as our baseline to determine if additional records are new, deleted, or updated. The first build will use the "Replace All" option, then subsequent builds will use the Accumulation option.
Example initial table ("FACT_Raw_Data"):
** Important Assumption **
The [OrderDate] of records is correctly set in the first insertion – no future/ faulty dates can be set (original record should not have a bad date)
Steps to implement:
- To obtain the record of interest, we will first need to create a custom table in the ElastiCube ("Latest_Record") to flag that MAX date per transaction.
The SQL should look something like this:
MAX(a.[OrderDate]) AS max_record
FROM [FACT_Raw_Data] a
GROUP BY a.[OrderID]
- Perform a build ("Changes Only" is okay) so we can use our new "Latest_Record" table in future custom table expressions.
- Create another custom table ("FACT_Sales") INNER JOIN'ing the data in our initial "FACT_Raw_Data" table with the max record of the transaction in the "Latest_Record" table.
SELECT DISTINCT a.*
FROM [FACT_Raw_Data] a
INNER JOIN [Latest_Record] b
ON a.[OrderID] = b.[OrderID]
AND a.[OrderDate] = b.[max_record]
WHERE deleted = 0
The results of executing this query will be identical to the "FACT_Raw_Data" table at this point, as each transaction only has one record in that table (no updates or deletions have been performed yet):
- Perform another build ("Changes Only" is okay again) so we can use our new "FACT_Sales" table. Subsequently every build after should use the Accumulation option. Use the "Accumulate By" field option and select [FACT_Raw_Data].[OrderDate] as your field or accumulate the entire table.
Second Load – Build Accumulate:
- Dashboards will filter out deleted rows per the INNER JOIN applied in our "FACT_Sales" table (rows highlighted in red will not be displayed):
Note: Be sure to hide your initial "FACT" (in our case, "FACT_Raw_Data") and "Latest_Record" tables in the cube to prevent dashboard designers from inadvertently sourcing widgets from these tables. "FACT_Sales" should be the ONLY table incorporated into the larger data model / sourcing widgets on the front-end to ensure accurate reporting.
Please sign in to leave a comment.