Adding Data Security to an existing Elasticube Model can have little to great effect on the performance of the system. It is important to already have planned the Elasticube Model with the intention to apply Data Security in the future. This article will detail modeling strategies to overcome Data Security pitfalls.
Elasticube Model for Analyzing Sales and Inventory
In the above scenario we want to limit our employees to only see the Sale and Purchases corresponding to themselves. We facilitate this by applying data security on the EmployeeId column and joining the DimEmployees to Fact_Sale_Orders and Fact_Purchase_Orders tables. Now when a user logs into a dashboard sourced from our sales Elasticube they will only see data relevant to themselves.
This model now presents us with a problem. Sisense has two options to produce the members results when a dashboard filter is created using the following dimension tables: DimProducts / DimCountries / DimDate. The Data Security will try and join Dim_Employees (where the Data Security is defined) and one of the other three dimension tables. Sisense can either join through the Fact_Sale_Orders or Fact_Sale_Orders tables, which is a random path problem. Because of this, a user could see inaccurate results on the dashboard filter members - reduced members or members that they are not supposed to see.
Eg: The following screenshot illustrates what happens when a dashboard filter is created on a field from DimCountries table. There are two paths that are possible from the security table (DimEmployees) to the dashboard filter's table (DimCountries). Sisense chooses between the path (1) and (2) in random as both the query paths are tied for the shortest path.
Solution 1 - Key Table:
The Key table is a central table that stores all the relationships between the primary key values. This solution solves our problem because the key table has all the possible combinations, so when we join to produce results, we will have all the possible options available from the Fact tables. You can find more details on Key Tables here.
- Key Table Size - Because the key table is essentially a cross join between the dimension tables it can grow exponentially in size.
- Dashboard Performance - The Key Table increase the path length between our dimension and fact tables which will result in slower dashboard performance
- Maintenance - Iterating on an Elasticube Model containing a Key Table could be challenging because planning is key when creating a key table.
- Build Time - Additional build time will be needed to calculate Key Table
- Keep the Key table size below 50M records
- Try creating partial key tables instead of including all your dimensions if not needed
- Plan your Key Table!
Security Exception on certain Dimension Tables:
Sometimes it is not necessary to apply security on all the dimension table - hence, with the approach of Key Table, you need not include those dimension tables to the Key table and they can be directly connected to the Fact tables. Refer to the screenshot below - usually it is DimDate table that need not be protected.
After this, while setting up Data Security, you need to ensure you set the scope correctly. DimDate table should be excluded from the Data Security's scope. Refer to the screenshot below - this means that, when you create a dashboard filter using DimDate table or a widget just using fields from DimDate table, the Data Security will not take effect.
You can read more about "Scope Limitations" in Row-level Data Security from here to understand all the different options in detail.(Scope Limitations are available from version 8.2.1 and above)
Solution 2 - Consolidation:
If our fact tables are on the smaller side (<100M records) we could look to consolidate our fact tables into a single fact table.
You can create a Custom SQL table to union the two fact tables together. A flag can be created to differentiate the fact tables. In this example we have created the "Type" field that has two values: Sale or Purchase.
Now, the data security is defined on the DimEmployees table (Security) and if there is a dashboard filter created off of DimCountries / DimDate / DimProducts (Filter), there is only one path for Sisense to reach from the Security dimension table to the Filter dimension table which is via the only fact table in the model - Fact_Sales_Purchase. Thus, there is no random path issue here.
- Additional Build time to process Custom SQL Table
- All Dashboard formulas will need to contain a value filter on Type
- Consolidated Elasticube record size does not exceed 250M records
Solution 3 - Bridge Table:
An intermediary table can be created to increase the path length between dimensions and facts to control query routing. Generally, to create a query path from table 1 to table 2, Sisense uses the shortest path algorithm. When we have two fact tables in model, we choose one of them as the preferred fact table and we try to force the Sisense query path through that fact table. To force the query path, we manipulate the length of the paths by adding one extra table to the path involving the non-preferred fact table, thus indirectly routing Sisense to choose the path involving the preferred fact to reach from the Dimension dimension table to the Filter dimension table.
Creating Bridge Table and setting up the path:
For every Dimension table that needs to be filtered by Data Security (DimEmployees) we need to create one bridge table. In our example, we need DimCountries and DimProducts to be filtered by Security dimension. For each of these table, follow the steps below: (Steps below for DimProducts)
- Step1: Duplicate the key in the DimProducts table and name it "Product ID (1)"
- Step2: Create a custom table with the following SQL and name it "Product_Bridge"
- Step3: Join DimProducts table and Fact_Sale_Orders table (preferred fact table) directly using DimProducts.Product ID and Fact_Sale_Orders.Product ID
- Step4: Join DimProducts table and Product_Bridge table using DimProducts.Product ID (1) and Product_Bridge.Product ID (1). Then join Product_Bridge and Fact_Purchase_Orders (non-preferred fact table) on Product_Bridge.Product ID and Fact_Purchase_Orders.Product ID.
Similarly, repeat the above steps for DimCountries table as well by creating Country_Bridge table.
The other dimension table in the data model that doesn't have a bridge table is DimDates table which will not be affected by Data Security as the data in the table is insensitive. So, use Scope Limitation (as explained in the "Security Exception on certain Dimension Tables" part of Solution 1) to exclude DimDates from the Data Security's effect.
How does the Query Path work?
You can see from the above image that two tables have been created to increase the path length. The DimProduct table is directly joined to the Fact_Sales_Orders table but joined to the Product_Bridge table to reach SalesOrderDetail. Alternatively DimCountries table is directly joined to the Fact_Sales_Orders table but joined to the Country_Bridge table to reach SalesOrderDetail.
So, when DimProducts is used as the Filter Dimension for a dashboard filter and DimEmployees is the security dimension table, taking the query path through Fact_Sale_Orders (in green - 2 joins) will be shorter than the query path through the Fact_Purchase_Orders (in blue - 3 joins) table as the addition of Product_Bridge table increases the query path's length of the latter. Refer to the screenshot below.
- Potential Many 2 Many joins could occur through query routing created by the Sisense Translator
- Duplicate the Primary Key in the Dimension table - one key for Fact_Sale_Order and another for the bridge table so Sisense does not create a join through the bridge table to both Fact tables.
- Also, create two copies of keys in bridge table where one is used to connect to the dimension table and the other is used to connect to the fact table - this will not create a direct join between the dimension table and the fact table (non-preferred fact)
Solution 4 - Dual Bridge:
There would be two intermediary tables - one on the Fact tables' side (Fact_Bridge) and the other on the Dimension tables' side (Dim_Bridge). These intermediary tables are identical copies of the Security Dimension table and will act as a "bridge" between the Security Dimension table and all Fact tables & all Dim tables. The only constraint to use this approach is that, the foreign key to the security dimension table should be present in all the sensitive tables - Fact and Dimension tables whose data is to be securely maintained by data security.
Creating two Bridge Tables and setting up their path:
Remove all the connections on the Security Dimension table and duplicate the Security Dimension table twice - Fact_Bridge and Dim_Bridge. Just keep the primary key field + another copy of the primary key (ID and ID_Sec) in the duplicated tables. Also, make two copies of the primary key on the Security Dimension table and name it ID_FB and ID_DB.
In the example below, the above steps have been performed to the data model. The Security Dimension table is: DimCustomers (CustomerID) and foreign key to Dim_Customers is available on sensitive dimension tables (DimProducts and DimEmployees) and all the fact tables.
The Fact_Bridge will be between the Security Dimension table and all the Dim tables in the data model. The Dim_Bridge will be between the Security Dimension table and all the Fact tables. Perform the following steps:
- Connect DimCustomers and Fact_Bridge on DimCustomers.CustomerID_FB & Fact_Bridge.CustomerID_Sec
- Connect DimCustomers and Dim_Bridge on DimCustomers.CustomerID_DB & Dim_Bridge.CustomerID_Sec
- Connect Dim_Bridge and all the Fact tables on Dim_Bridge.CustomerID & Fact Tables' Customer ID
- Connect Fact_Bridge and all the sensitive Dimension tables on Fact_Bridge.CustomerID & Sensitive Dimension Tables' Customer ID. (Only on DimEmployees & DimProducts)
After making the above connections, the model will look like the one on the following screenshot:
The other dimension tables in the data model that don't have the CustomerID are DimDate and DimCountries tables which will not be affected by Data Security as the data in the table is insensitive - hence, they are not connected to the Fact_Bridge table. So, use Scope Limitation (as explained in the "Security Exception on certain Dimension Tables" part of Solution 1) to exclude DimDate & DimCountries tables from the Data Security's effect.
Here is the Scope Limitation settings:
How does the Query Path work?
When you create a dashboard filter using the table DimEmployees table, Data Security will come into effect and hence Sisense would create a query path between DimCustomers and DimEmployees table. In our data model (as in the screenshot below), there are three possible paths where the #1 is through the Fact_Bridge and #2 and #3 are through the Dim_Bridge table. Going through the Dim_Bridge table is a longer route compared to taking the Fact_Bridge table in this case - So Sisense will prefer the shortest path here ending up taking the query path #1, which is the desired path.
Now, when you create a widget with Fact_Sale_Orders, Sisense would try to create a query path between DimCustomers and Fact_Sales_Orders - there will be three paths possible in our data model (as in the screenshot below) where #1 will go through the Dim_Bridge and the other two will go through Fact_Bridge and then, each of the sensitive dimension tables. Sisense will obviously choose the path through the Dim_Bridge which is the shortest path here in this scenario and is the desired one.
With the help of the Bridge, we could play with the length of the query paths and force Sisense to take the desired path in different scenarios.
- Potential Many 2 Many joins could occur through query routing created by the Sisense Translator when there is complicated front-end formula functions used - like ALL() in certain scenarios and PAST/NEXT/GROWTH functions when applied on the date field that's a part of the sensitive dimensions in the data model. (ex: If DimDate is made sensitive to data security in the example and then PAST/NEXT/GROWTH functions are used in the context of DimDate.Date field)
- The Dependent filters (dashboard filters) on two different dimension tables doesn't work properly due to Many 2 Many joins
- Duplicate the Primary Key in the Security Dimension table twice - one to connect to Dim_Bridge and the other to connect to Fact_Bridge
- Also, create two copies of keys in each of the bridge tables where one is used to connect to the security dimension table and the other is used to connect to other dimension/fact tables
One could use one of the above approaches depending on factors like: how the model is structured, how the fact tables in the model are related, what is the desired fact table to apply the security on the dimensions, whether the sensitive dimension tables are directly related to the security dimension table and what the specific usecase is.