http://vizdiff.blogspot.com/2016/10/counting-customers-who-bought-both-and.html]
In marketing nomenclature, this is called market basket analysis. The analysis is aimed at understanding customer purchasing behaviors and the correlations between products.
In Tableau Knowledge Base, there is an article about the topic.
Create Views for Market Basket Analysis
The suggested approach is good for relative small data set. Note that self joining will generate a much larger data set. In that specific Superstore data set of ~10K rows, it generates 150K rows after self joining. That is a 15x increase. The actual multiples are in proportion to the square of the number of products.
In our use case, we have data sets with millions of rows and a large selection of products, the approach may create a huge amount of data that will slow down Tableau a lot.
Here we propose an approach that does the aggregations before visualization. In the example, we use the Superstore data set in Excel. In our case the data is in a database. The SQL code is similar though.
The main steps are:
1. Load your data through the legacy connection which enables the Custom SQL feature.
2. Use this Custom SQL code to create data set. This will generate 289 rows given 17 Sub-Categories of products. That's all we need for visualization.
- SELECT COUNT([a.Customer ID]) as [Count], A, B FROM
- (SELECT DISTINCT [a.Customer ID],[a.Sub-Category] as A,[b.Sub-Category] as B
- FROM [Orders$] a INNER JOIN [Orders$] b
- on [a.Customer ID]=[b.Customer ID] )
- GROUP BY A,B
3. As a result, we will get a matrix like this. You can then base your viz on the data like the one in the KB article.
All the following steps are optional. They are about to further remove the redundancy in the above matrix which is symmetric. Those numbers on the diagonal are not needed.
4. The numbers on both sides of the diagonal are the same. We only need those data below the diagonal. Let's create a filter to remove the data on the diagonal and above it.
First create a calc field [OrderA]: (See Coding Case Statement Made Easy)
Create [OrderB] in the same manner.
5. Add the [Filter] below and we will remove those redundant numbers.
- OrderA>OrderB
And voila. The emphasis is that we can move most of the calculations to the data preparation stage. The resulting data set for visualization is quite small. That is what Tableau is best at.
Alternative filter
Another way to filter the matrix (without much coding) is to create IndexA and IndexB computing respectively along A and B. For example, IndexA (Similar for IndexB):
Then the filter is IndexA>IndexB. In the table calculation settings for the filter, IndexA and IndexB are set to compute respectively along A and along B.
That's it!
Thanks for sharing.It is really helpful.It would be good, if we can get the tbwx file for the same.
ReplyDeleteAt step 6, click the image to download the workbook.
DeleteThanks for the soon reply.It works.
DeleteHi, this a great post!
ReplyDeleteHowever, if you click on the image, the public viz is wrong somehow, there are Appliance-appliace, Binders-binders cirlces as well, but not all of them present.
Could you review it?
Fixed. Thanks for alerting.
DeleteGreat post.
ReplyDeleteI am trying to do something exactly like this for a project I am working on, but every time I try it, I get a sytax error at GROUP in the custom SQL entry. Any ideas what I am doing wrong? I have tried to mimic what is in the post as closely as I can.
ReplyDelete