Surrogate Keys are mainly used for creating a single identifier for multiple dimensions, mainly when building a Key Table. A text key can be easily generated by concatenating the text and delimiters, however, text can become very long and affect the performance.
The following method, will demonstrate how to create a numeric key to avoid using long text strings.
Important Note -
For using this method, the original keys should be numeric and the overall generated key length must fit a Bigint length (18 digits)
Step 1 - Planning your key
Planning and documenting the key is extremely important. For each one of the dimensions we want to incorporate in the key, we need to understand the range and number of unique values.
You can use the following custom SQL expression -
Document the key in an Excel table -
Allocate Digits and Define alerts -
For each dimension, we need to allocate the relevant number of digits within a single bigint field, while planning the location of each dimension in this field (we’ll call it key index).
Growing dimension id values should be taken into consideration. In the following example, we allocated 3 digits for Employee_Id - 2 digits to support up to 99 different employees and 1 digit to support a possible negative value (see below). We don’t expect the employee id to ever grow over 20. For customers, we allocated 4 digits to support up to 9999 customers and no negative value. The “Alert Flags” will be defined in a table to alert key breaches.
The key logic -
We will allocate the indexes of a bigint digits to each key based on the above allocation and create the key in our Fact Table.
In the following snapshot you can see the allocation -
For cases of negative values in the dimensions, we need to allocate a digit that will represent the sign. In the example above, in case Employee_Id = -1, the key will hold the value 101 while the most left digit represents a negative value. If Employee_Id=1, the key will hold the value 001.
Null Values -
In case of a null value, we will allocate 0
NOTE - For each use case there may be different specifics and limitations, they should be carefully investigated and tested - if the logic can’t guarantee consistency or doesn’t fit the total number of digits (18 digits) - this method cannot be applied
Keys Examples -
Employee_Id=9, Customer_Id=80, Product_Id = 50
Key - 000000000500080009
Employee_Id=-1, Customer_Id=101, Product_Id = 5
Key - 000000000050101101
Employee_Id=7, Customer_Id=NULL, Product_Id = 500
Key - 000000005000000007
Generating the Key -
Create a new BigInt type custom field -
In order to generate the key, we’ll multiply each dimension ID with a multiple of 1 plus the number of zeros to match the index in the key and use the ‘+’ operator to add the next dimension value.
In our example, product is the left most dimension and it ends in index 7 (counting from 0) - therefore, we will multiply it by 10000000. To it, we’ll add the customer id multiplied by 1000 since it “ends” in the index 3. The last dimension, will not be multiplied since it’s the last one in the key.
In addition, we will use IFBIGNT syntax to handle the negative / null conditions -
ifbigint(isnull([Product_Id]),0,[Product_Id])*10000000 ---Product ID, Indexs 7 - 10
ifbigint(isnull([CustomerID]),0,[CustomerID])*1000 --- Customer ID, Indexes 3 - 6
ifbigint(isnull([EmployeeID]),0, /*if not null - negative test*/ ifbigint([EmployeeID]<0,100+(-1)*[EmployeeID],[EmployeeID])) --Employee ID, Indexes 0 - 2
Creating an Alerting Table -
For each potential breach we want to monitor, we can create a custom SQL with the relevant condition and then add a PULSE alert that will trigger an email in case of a breach. This will allow us to maintain a valid key.
Example Syntax -
max(e.EmployeeID) LowValue, min(e.EmployeeID) MinValue, DISTINCT_count(e.EmployeeID) NumberOfValues,
WHEN DISTINCT_count(e.EmployeeID)> 99 THEN 1
END AS AlertFlag -- Alerting when the range is breaching
FROM Employees e
max(c.CustomerID) LowValue, min(c.CustomerID) MinValue, DISTINCT_count(c.CustomerID) NumberOfValues,
WHEN min(c.CustomerID)<0 THEN 1
END AS AlertFlag -- Alerting when CustomerID is negative
FROM Customers c