In some cases, you would like to count the number of unique entities that did a certain action/transaction. For example, how many individuals entered my store last month and purchased one of my products. But you might also be interested in cumulative distinct count month over month. For example, how many individuals purchased a product from my store from the beginning of the year up to this month. This question might help you understand how effectively is my marketing campaign, the popularity of my store and which one of my products is more stable, need more attention or need to be removed to make more room on the shelf for more successful products.
If you will use simple COUNT aggregation (which basically counts the unique appearances of the selected field) per month, you will get the number of the individuals who entered only on the specific month (not including all the ones that visit the store before). Unlike Running Sum and Average, Running Distinct Count is not an existing function.
Let’s assume that this is the structure of your transactional table:
If we’ll use a simple COUNT within a pivot table, it will look like this:
You can see that for example, for categoryId 1000 had 9 unique buyers on 01/2017, but on 02/2017 BuyerId 19056 didn’t purchase it again, so the total for that month will be 8 even though we would like to include that buyer for this month, because he or she purchased it in the past.
On another example, for category 1001 for the first two months the total is 7, but in fact if you’ll take in the second month all the unique buyerIds that purchase that product from the beginning of the selected timeframe, we should expect to see 9 in the total row.
So instead of having this summary looks like this:
We should see the following results set:
We’ll need to refer each purchase to all its upcoming months:
- To do so, first we’ll need to build the infrastructure for it, which will include all possible combination between Month and Category, with its relevant BuyerId and amount:
FROM (SELECT DISTINCT TA1.categoryId, TA2.month FROM (SELECT DISTINCT TA.categoryId FROM Transaction_Agg TA) TA1
CROSS JOIN (SELECT DISTINCT TA.month FROM Transaction_Agg TA) TA2) TA3
LEFT JOIN Transaction_Agg TA4 ON TA3.categoryId=TA4.categoryId AND TA3.month=TA4.month
Its outcome would look like this:
- Then, we’ll replicate each transaction, so it will have record for every future month within the data in addition to the actual month:
t2.month AS [running count Month]
LEFT JOIN t1 AS t2 ON t1.month<=t2.month AND t1.categoryId=t2.categoryId
The outcome will look like this:
On the dashboard side:
- If you use the ‘running count month’ as your date field within the new table, t2, you’ll get the expected result. For example:
Important To Note
You can use the same mechanism for the running sum of the amount for each CategoryId month over month, just need to remember that in the new table the amount is being multiplied. So you might want to consider using a multi-pass aggregation in the widget formula.
Also, if you're dealing with a large data scale you probably should test it first as you'll require to multiple your data by [# of unique dates] X [# of unique categories]