There has been a calculation that is needed in market basket and correlation analysis. There is a KB article about the topic (by self-join) and I wrote a post about it before (by custom SQL).

Recently Łukasz Majewski has provided an ingenious solution using data densification. I am totally stumped. I am going to applying it to the same superstore example that I used before, and as a result, to document the details.

The objective of the analysis is to count customers who bought in both A and B subcategories of products.

The main steps are:
1.Duplicate Sub-Category field: Sub-Category (Copy). Drag them into columns and rows respectively. We see 17 data marks in the view.
2.Create Index (=Index()) and drag it into Details shelf. Set it to compute using Table Across or Subcategory. This populates the cross tab table with 17x17=289 data marks.

Each mark represents a partition by a pair of sub categories, including redundancies like (Accessories, Accessories),(Accessories,Art), (Art,Accessories).

It is equivalent to a cross join between Sub-Category and Sub-Category (Copy). We accomplished this using Self Join or Custom SQL. Obviously, this method via data densification requires the least operations.
3.Drag Customers Names into Details. This brings one more dimension into the view. At each pair partition, there are multiple customers, each with a mark. The data marks are overlapped and we don't see them. The number at the lower left corner shows it all: 50682 in total.
4.Create Size (=Size()) and drag it to Labels shelf. Set it to compute along Customer Names.

This Size() function shows how many data marks at each pair partition. The labels which are all the same, are also overlapped and, as a result, look like in Bold.
At each pair, we just need one label and thus one data mark. Now let's do some filtering.

5.Turn Index into Discrete and drag it to the Filters shelf. Set it to compute along Customer Names. Select its value to be 1.
This will select the first data mark and label at each pair partition, and leave out the rest. It reduces the number of data marks to 289. Now the numbers in the cross tab looks normal. No longer overlapped or in bold.

6.The matrix thus created is symmetrical versus the diagonal. Actually, (A,B) and (B,A) partitions are the same. The numbers on the diagonal are not needed because (A,A) is not a pair of distinct products. We need to filter them.

Let's create a Diagonal filter as follows.
  • Index < Index(Copy)
Drag the above filter to the Filters shelf. Right click it and select Table calculations. Set Index to compute along Sub-Category and Index (Copy) to compute along Sub-Category (Copy). Make sure the Diagonal filter is set to True.

This will keep only those numbers under the diagonal. That's all we need: 136 data marks.
BTW, Lukasz uses an integrated formula to create data densification and calculate the label of the first data mark:
  • If [Index]>0 and First()=0 then Size() end
This is an amazing formula that does a lot of work. In the above, I try to do this in multiple steps so that hopefully it is easier to explain.

That's all. There are always multiple ways to do one task by Tableau. The community has never ceased to amaze. Click here to access the workbook.
3

View comments

  1. An earlier post on this topic proposed a custom SQL based approach. It meant to shift some of the heavy calculations to the database, and reduce the data set size.
    http://vizdiff.blogspot.com/2016/05/counting-customers-who-bought-both-and-b.html

    ReplyDelete
  2. That was really helpful. Thanks

    ReplyDelete
  3. what is index representing in this?

    ReplyDelete

(Refresh the page if you want to view the gif image multiple times. Or go to Tableau Public and click the button at the top-right corner.)

Jake and I collaborated on a dashboard. He told me that he learnt a way to create an in-place help page in Tableau. He first saw it at a conference somewhere and couldn't recall who the speaker was. So I am blogging here about it but the credit goes to somebody else. If anyone knows who the original creator is, leave a comment below.

The key idea is to float a semi transparent worksheet on top of the dashboard, where a help text box is strategically placed on top of each chart. This way, we can explain how to view each chart and what data points are important, etc. This worksheet is collapsible by a show/hide button. 

Below I would like to show how this worksheet can be constructed.

1. Sheet with a single data mark.

  • Double click the empty space in Marks panel and add two single quotes. Make the null pill a text label. This creates a single null mark.
  • Set the view as "Entire View"

2. Create an show/hide button

  • Go to the target dashboard
  • Drag a floating vertical container to the dashboard, making it cover all the area of interest.
  • Drag the Single Null Mark sheet and drop it into the above container. Hide the sheet title.
  • Create an open/close button for the container and place the button at the top-right corner.

3. Add annotations

  • Format the sheet background opacity as 70% in the layout manager             
  • Select area annotations and place them anywhere of interest. 
  • Write help text and format it to highlight important messages.  
  • The text can serve as functional guide and/or insight guide.

Here is an example. Feel free to download the workbook and explore. Click the "i" button at the top-right corner to view the in-place help. 

0

Add a comment

Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.