Recently Ćukasz Majewski has provided an ingenious solution using data densification. I am totally stumped. I am going to applying it to the same superstore example that I used before, and as a result, to document the details.
The objective of the analysis is to count customers who bought in both A and B subcategories of products.
The main steps are:
1.Duplicate Sub-Category field: Sub-Category (Copy). Drag them into columns and rows respectively. We see 17 data marks in the view.
2.Create Index (=Index()) and drag it into Details shelf. Set it to compute using Table Across or Subcategory. This populates the cross tab table with 17x17=289 data marks.
Each mark represents a partition by a pair of sub categories, including redundancies like (Accessories, Accessories),(Accessories,Art), (Art,Accessories).
It is equivalent to a cross join between Sub-Category and Sub-Category (Copy). We accomplished this using Self Join or Custom SQL. Obviously, this method via data densification requires the least operations.
3.Drag Customers Names into Details. This brings one more dimension into the view. At each pair partition, there are multiple customers, each with a mark. The data marks are overlapped and we don't see them. The number at the lower left corner shows it all: 50682 in total.
4.Create Size (=Size()) and drag it to Labels shelf. Set it to compute along Customer Names.
This Size() function shows how many data marks at each pair partition. The labels which are all the same, are also overlapped and, as a result, look like in Bold.
At each pair, we just need one label and thus one data mark. Now let's do some filtering.
5.Turn Index into Discrete and drag it to the Filters shelf. Set it to compute along Customer Names. Select its value to be 1.
This will select the first data mark and label at each pair partition, and leave out the rest. It reduces the number of data marks to 289. Now the numbers in the cross tab looks normal. No longer overlapped or in bold.
6.The matrix thus created is symmetrical versus the diagonal. Actually, (A,B) and (B,A) partitions are the same. The numbers on the diagonal are not needed because (A,A) is not a pair of distinct products. We need to filter them.
Let's create a Diagonal filter as follows.
- Index < Index(Copy)
This will keep only those numbers under the diagonal. That's all we need: 136 data marks.
BTW, Lukasz uses an integrated formula to create data densification and calculate the label of the first data mark:
- If [Index]>0 and First()=0 then Size() end
That's all. There are always multiple ways to do one task by Tableau. The community has never ceased to amaze. Click here to access the workbook.
View comments