1. There has been a calculation that is needed in market basket and correlation analysis. There is a KB article about the topic (by self-join) and I wrote a post about it before (by custom SQL).

    Recently Łukasz Majewski has provided an ingenious solution using data densification. I am totally stumped. I am going to applying it to the same superstore example that I used before, and as a result, to document the details.

    The objective of the analysis is to count customers who bought in both A and B subcategories of products.

    The main steps are:
    1.Duplicate Sub-Category field: Sub-Category (Copy). Drag them into columns and rows respectively. We see 17 data marks in the view.
    2.Create Index (=Index()) and drag it into Details shelf. Set it to compute using Table Across or Subcategory. This populates the cross tab table with 17x17=289 data marks.

    Each mark represents a partition by a pair of sub categories, including redundancies like (Accessories, Accessories),(Accessories,Art), (Art,Accessories).

    It is equivalent to a cross join between Sub-Category and Sub-Category (Copy). We accomplished this using Self Join or Custom SQL. Obviously, this method via data densification requires the least operations.
    3.Drag Customers Names into Details. This brings one more dimension into the view. At each pair partition, there are multiple customers, each with a mark. The data marks are overlapped and we don't see them. The number at the lower left corner shows it all: 50682 in total.
    4.Create Size (=Size()) and drag it to Labels shelf. Set it to compute along Customer Names.

    This Size() function shows how many data marks at each pair partition. The labels which are all the same, are also overlapped and, as a result, look like in Bold.
    At each pair, we just need one label and thus one data mark. Now let's do some filtering.

    5.Turn Index into Discrete and drag it to the Filters shelf. Set it to compute along Customer Names. Select its value to be 1.
    This will select the first data mark and label at each pair partition, and leave out the rest. It reduces the number of data marks to 289. Now the numbers in the cross tab looks normal. No longer overlapped or in bold.

    6.The matrix thus created is symmetrical versus the diagonal. Actually, (A,B) and (B,A) partitions are the same. The numbers on the diagonal are not needed because (A,A) is not a pair of distinct products. We need to filter them.

    Let's create a Diagonal filter as follows.
    • Index < Index(Copy)
    Drag the above filter to the Filters shelf. Right click it and select Table calculations. Set Index to compute along Sub-Category and Index (Copy) to compute along Sub-Category (Copy). Make sure the Diagonal filter is set to True.

    This will keep only those numbers under the diagonal. That's all we need: 136 data marks.
    BTW, Lukasz uses an integrated formula to create data densification and calculate the label of the first data mark:
    • If [Index]>0 and First()=0 then Size() end
    This is an amazing formula that does a lot of work. In the above, I try to do this in multiple steps so that hopefully it is easier to explain.

    That's all. There are always multiple ways to do one task by Tableau. The community has never ceased to amaze. Click here to access the workbook.
    3

    View comments

  2. ASCII (and Unicode) characters can offer a variety of lightweight data marks. Judicious choices of context-sensitive data marks can create more interest in the eyes of the viewer and lead to better visualizations.

    Especially in tooltips, we are limited to using ASCII/Unicode characters. Some of them can be used as data marks to provide data visualizations. ASCII art have been around for a long while. With ASCII, we can create very expressive art form if we dare to be creative.

    For example, in restaurant ratings, we have seen expressions like
    • Expensive: $$$
    • Moderate: $$
    • Cheap: $
    Depending on countries and being consistent with context, we can replace the above $ by € or ‎¥.

    In various ratings such as movies or product reviews, we have seen stars being widely used.
    • ✰✰
    • ✰✰
    • ✰✰
    So this is not something new. What is new is we can use them in Tableau's tooltips to create lightweight data visualization that complements and enhance the primary charts. This has been advocated by Andy Cotgreave. Recently new Zen Master Rody Zakovich showed some more tricks. Especially the formula Replace(Space(),' ','█') is very convenient for creating a long string of any character. 

    In a recent viz of the day on 10/18/2016, I found that the vertical bars are hard to compare in height between Clinton and Trump numbers. This can be remedied by adding a little ASCII chart in tooltips.

    Note that we used ▲ and ▼ as data marks to represent upward/downward trends. Also we added green color for up and red color for down. This seems more context sensitive than the bar chart based on '█'.

    That concludes the tweak today. Click the images for interactive versions.


    0

    Add a comment

  3. A little question about Markov Chain came up lately. That pushed me to have a closer look at what it does. I had a vague idea about it before. This time I understand it better.

    I saw a question in StackOverflow where people seems favoring a solution in SQL. I found it pretty easy to calculate in Tableau and with better visualization.

    So, the matrix includes the probabilities of a group of entities transitioning from one state to the next state. The transition will happen within the same set of states.

    Each row of the data set contains the old and new states per entity. We only need to calculate, given an old state, what is the probability of becoming a new state, or the distribution of probabilities of becoming any state. This article explains pretty well what the Markov Chain is.

    So, I will use the data set in the above StackOverflow question as an example. Then I create the number of records and state-to-state probability in matrices.
    Note that in the fractions, the denominators are calculated using row totals. The row totals can be computed using either Sum({Fixed Year13: Sum(Number of Records)}) or Window_Sum(Sum(Number of Records)) computing using Year14.

    Further visualization is of the following varieties:
    Click the above images to view or download the interactive versions.
    1

    View comments

  4. After a first-time visit to the village in Lichuan, Hubei, Central China where my grandpa was born, I am back. I gained some new perspectives on the changes in the past hundred years.

    The Viz of the Day on 10/12/2016 is about the spells in Harry Potter books. I didn't know that spells could be this powerful such that the author Skylar Johnson is invoked to create this special viz. Very nice work! I see that even the labels match the colors of the data marks, which is a new feature in Tableau 10.
    One thing I found possible to tweak is the histogram bar chart at the upper part of the viz. It shows the number of spells in each of the 7 volumes of Harry Potter. Under it, is the circle chart where we also see colored divisions by book. I feel that they should be aligned.
    So here is the result of the tweaking. The data marks I used are lines whose sizes are proportional to the number of spells. The lengths of the lines also align with the page positions.
    Click the above images to view and download the interactive versions. Below is an enlarged view of the histogram on spells per book. The axis are custom adjusted to align with the circle chart.

    0

    Add a comment

Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.