[Update: A new method has been posted
http://vizdiff.blogspot.com/2016/10/counting-customers-who-bought-both-and.html]

In marketing nomenclature, this is called market basket analysis. The analysis is aimed at understanding customer purchasing behaviors and the correlations between products.

In Tableau Knowledge Base, there is an article about the topic.
Create Views for Market Basket Analysis

The suggested approach is good for relative small data set. Note that self joining will generate a much larger data set. In that specific Superstore data set of ~10K rows, it generates 150K rows after self joining. That is a 15x increase. The actual multiples are in proportion to the square of the number of products.

In our use case, we have data sets with millions of rows and a large selection of products, the approach may create a huge amount of data that will slow down Tableau a lot.

Here we propose an approach that does the aggregations before visualization. In the example, we use the Superstore data set in Excel. In our case the data is in a database. The SQL code is similar though.

The main steps are:

1. Load your data through the legacy connection which enables the Custom SQL feature.

2. Use this Custom SQL code to create data set. This will generate 289 rows given 17 Sub-Categories of products. That's all we need for visualization.
  • SELECT COUNT([a.Customer ID]) as [Count], A, B FROM 
  •     (SELECT DISTINCT [a.Customer ID],[a.Sub-Category] as A,[b.Sub-Category] as B
  •     FROM [Orders$] a INNER JOIN [Orders$] b
  •     on [a.Customer ID]=[b.Customer ID] )
  • GROUP BY A,B
Note in the Tableau KB article, the self joining is equivalent to the code in lines 2 - 4 above. Adding two more lines of SQL code (highlight in yellow) will save us a huge amount of data load. What it does is all the aggregations per pair of product sub-categories. This can make a big difference in visualization speed.

3. As a result, we will get a matrix like this. You can then base your viz on the data like the one in the KB article.
All the following steps are optional. They are about to further remove the redundancy in the above matrix which is symmetric. Those numbers on the diagonal are not needed.

4. The numbers on both sides of the diagonal are the same. We only need those data below the diagonal. Let's create a filter to remove the data on the diagonal and above it.

First create a calc field [OrderA]: (See Coding Case Statement Made Easy)
Create [OrderB] in the same manner.

5. Add the [Filter] below and we will remove those redundant numbers.
  • OrderA>OrderB
6.Here is the viz based on the data. Click image to view or download the interactive version.
And voila. The emphasis is that we can move most of the calculations to the data preparation stage. The resulting data set for visualization is quite small. That is what Tableau is best at.

Alternative filter

Another way to filter the matrix (without much coding) is to create IndexA and IndexB computing respectively along A and B. For example, IndexA (Similar for IndexB):

Then the filter is IndexA>IndexB. In the table calculation settings for the filter, IndexA and IndexB are set to compute respectively along A and along B.
That's it!
7

View comments

  1. Thanks for sharing.It is really helpful.It would be good, if we can get the tbwx file for the same.

    ReplyDelete
    Replies
    1. At step 6, click the image to download the workbook.

      Delete
    2. Thanks for the soon reply.It works.

      Delete
  2. Hi, this a great post!

    However, if you click on the image, the public viz is wrong somehow, there are Appliance-appliace, Binders-binders cirlces as well, but not all of them present.
    Could you review it?

    ReplyDelete
  3. I am trying to do something exactly like this for a project I am working on, but every time I try it, I get a sytax error at GROUP in the custom SQL entry. Any ideas what I am doing wrong? I have tried to mimic what is in the post as closely as I can.

    ReplyDelete

(Refresh the page if you want to view the gif image multiple times. Or go to Tableau Public and click the button at the top-right corner.)

Jake and I collaborated on a dashboard. He told me that he learnt a way to create an in-place help page in Tableau. He first saw it at a conference somewhere and couldn't recall who the speaker was. So I am blogging here about it but the credit goes to somebody else. If anyone knows who the original creator is, leave a comment below.

The key idea is to float a semi transparent worksheet on top of the dashboard, where a help text box is strategically placed on top of each chart. This way, we can explain how to view each chart and what data points are important, etc. This worksheet is collapsible by a show/hide button. 

Below I would like to show how this worksheet can be constructed.

1. Sheet with a single data mark.

  • Double click the empty space in Marks panel and add two single quotes. Make the null pill a text label. This creates a single null mark.
  • Set the view as "Entire View"

2. Create an show/hide button

  • Go to the target dashboard
  • Drag a floating vertical container to the dashboard, making it cover all the area of interest.
  • Drag the Single Null Mark sheet and drop it into the above container. Hide the sheet title.
  • Create an open/close button for the container and place the button at the top-right corner.

3. Add annotations

  • Format the sheet background opacity as 70% in the layout manager             
  • Select area annotations and place them anywhere of interest. 
  • Write help text and format it to highlight important messages.  
  • The text can serve as functional guide and/or insight guide.

Here is an example. Feel free to download the workbook and explore. Click the "i" button at the top-right corner to view the in-place help. 

0

Add a comment

Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.