Market Basket Analysis is a data mining technique that is widely used to identify consumer patterns, such as which items are purchased together, or how the purchase of one item affects the likelihood of another item being purchased.
One of the famous examples, even though it is almost certainly an urban legend, is the story about the beer and the diaper. A supermarket chain discovered in its analysis that customers that bought diapers often bought beer as well, so they put the diapers close to beer coolers, and their sales increased dramatically.
Another widely used example of cross selling on the web with market basket analysis is Amazon.com’s use of “customers who bought book A also bought or took an interest in book B”.
In our example, we will analyze the correlation between different products and product categories within a set of orders.
(Sample data files, ecdata and dash, attached below)
We will start by preparing our data model to support this type of analysis and then build a dashboard for our market basket analysis.
In our example we will use a Microsoft ‘Northwind’ database sample, which holds the sales transactions that occur between the “Northwind traders” company and its customers, as well as the purchase transactions between Northwind and its suppliers.
We will start by demonstrating how to adjust the data model to support market basket analysis by using a custom SQL to join a table to a duplication of itself.
Items are the objects that we are identifying associations between.
In our example, an item is a product.
Transactions are instances of groups of items coexisting together.
In our example, each order is a transaction.
Open the Northwind database in the Sisense ElastiCube Manager
The ‘OrderDetails’ table holds the order transactions and the products that were purchased in each order.
To analyze if there are any associations between one product or a product category to another, we will create a self-join with the following conditions:
Transaction unique identifier = Transaction unique identifier
In our example, Order ID = Order ID
Item <> Item
In our example, Product ID <> Product ID
Click “Add Data”, and then select the “Custom SQL Expression”.
A new window will open, enter the new table name (for example OrderDetailsForAnalysis), and click on the V icon.
Enter the self-join script in the table editor window:
SELECT o1.* ,o2.ProductID AS Product2ID, o2.Quantity AS Product2Quantity
FROM [Order Details] o1, [Order Details] o2
WHERE o1.OrderID = o2.OrderID AND o1.ProductID <> o2.ProductID
Preview the results and save the table.
A new table is created with no connection to any other table.
Connect the table to the same tables that the original table (OrderDetails) was connected to.
In our example, join the new ‘OrderDetailsForAnalysis’ table with the ‘Products’ and ‘Orders’ tables.
If the item we are analyzing is part of a hierarchy, like in our case where products can be grouped into a category, and we want to use the hierarchy as part of the analysis, we should do one of the following:
Bring the hierarchy as part of our new custom table (add the 2nd product name and category name as part of the SQL)
Duplicate the hierarchy dimensions table/s, and join the tables with the 2nd item identifier.
In our example, we choose the 2nd option, and duplicate the ‘Products’ and ‘Category’ tables, and join Product2ID from the ‘OrderDetailsForAnalysis’ table with ProductID from the duplicate ‘Products’ table.
We can disconnect the original ‘OrderDetails’ table and make it invisible.
Our data model is now ready. Build the cube and open the Sisense dashboard.
Analyze with Reports and Dashboards
You can easily visualize and analyze the correlation between one product to another or one category to another by creating a pivot table that will display the number of occurrences per pair.
Create a new widget, and add the following:
CategoryName from the 1st categories table
CategoryName from the 2nd categories table
Arithmetic function on a field (for example ‘Count All’ function the ‘Quantity’ field) from the transactions table
Click on the advanced configuration at the bottom left.
Drag the 2nd category name from the rows area into the column area.
You can also highlight some of the cells to highlight important information for the dashboard’s consumers. For example, mark the top correlation per each one of the different product categories.
Click on the color button of your measure.
A new window opens.
Click on the conditional tab, and set the formula to the max of occurrences.
Our product category correlation table is ready. Each number inside the table represents the number of individuals who have purchased both from the rows product category and the column product category.
Straight away, you can see that many of the categories are correlated with the beverage category, but beverage is the most correlated with confectionary and seafood.
Note, that since we are analyzing the product category and not the products themselves, we can have a situation where an individual purchased two different products from the same category, and this is why we have numbers greater than 0 where the rows and the columns category name is identical.
You can easily create different kinds of reports and graphs to visualize and assist you in your analysis.
Several examples are presented below:
Stacked column chart, that easily summarizes the contribution of each item (product/ category) to another
Tree map chart per each item (product/ category)
Pie chart that will display the correlated product/category distribution and can be affected by product/category filter selection within the dashboard