This is a blog about an event at 2019. My Tableau Public dashboard was originally published on 7/16/2019.

A colleague gave a talk on this classic example of machine learning or data science: Classify Iris according to a simple set of observation data. He did his demo in Python. I kind of understood what the example was about. But I figured we can recreate the same classification using Tableau's native clustering function. I presented the result to the team and the visual approach was welcomed because it helped understanding a great deal.

The Iris data set is a famous one. It's being used in all sorts of classification approaches to validate the algorithms. It's easy to find the data online https://archive.ics.uci.edu/ml/datasets/iris

The Data Set

There are 150 flower records in total and 3 flower classes (setosa, versicolor, verginica). Each class has 50 records. Each flower has 4 measures in Petal LengthPetal Width, Sepal Length and Sepal Width.

The Approach

The idea is to use the 4 measures like Petal LengthPetal Width, Sepal Length and Sepal Width to cluster the data. The resulting clusters will need to compare with the actual flower classes. A good algorithm will 

  • Have the correct number of clusters (3 in this case)
  • Have few mismatches (a mismatch is where a flower of one class is clustered into another class
We will use Tableau's native cluster method to classify the records.

Define The Number of Clusters and What Measures to Use

We have 4 measures available. It's possible to use any one of them, or any 2, any 3 or all 4 of them to feed the cluster algorithm and get the clustering result. We can pre define the number of clusters or let the algorithm to decide. In this post, we will set the number of clusters to be 3, the same as the number of known classes. You can vary the number of clusters if you wish.

We will use 2 of the measures at a time. Given 4 measures, we have 6 possibilities. We try to compare which pair of measures will produce the best result or the best matches.

The Process

Pick the pair of Petal Width and Petal Length

1) Create a scatter plot using Petal WidthPetal Length and ID
2) Go to the analytics tab and drag Cluster to the canvas.
After selecting the only cluster method, we will obtain the following chart with 3 clusters by default. An interface pops up for entering the number of clusters we wish, or leaving it to be automatic which is 3.
We can also drag more measures to the box of variables or remove them from it. This will change the resulting clusters if we do so. The minimum is one variable.

The 4 measures can have 6 pairs of variables. We create all of them and put them in a dashboard

Details about the Cluster

For the curious bunch, we can get more details about the clusters. Right click on the "Cluster" pill and select "Describe Cluster". We will open Summary and Model about the clusters.

The Quality of Clustering: Comparison between 6 Pairings

By bare eye inspection, we see that the pairs (2.Petal Length, Sepal Length) got 4 clusters by automatic clustering and (3.Petal Length, Sepal Width) got 2 clusters. The algorithm failed to find the correct number of clusters which is 3.

The following parings all resulted in 3 clusters. The first pairing is the best with the fewest mismatches.

(1.Petal Width, Petal Length): 6 mismatches
(5.Petal Width, Sepal Length): 18 mismatches
(6.Petal Width, Sepal Width): 11 mismatches
(4. Sepal WidthSepal Length): 30 mismatches

Conclusion

We showed how to create clustering in Tableau via the well known Iris data set. We only studied the cases where a pair of measures are used. 2 of the 6 cases didn't produce the expected number of clusters by automatic clustering. 4 of the rest cases produced 3 clusters which is expected. Among the 4, the best got 6 mismatches. The worst got 30. We will write another post on how to assess the quality of clustering.

Tableau's clustering method is not perfect. But it gives us a drag and drop tool for us to get a quick glance of the classifying result. It's based on K-means clustering.

0

Add a comment

(Refresh the page if you want to view the gif image multiple times. Or go to Tableau Public and click the button at the top-right corner.)

Jake and I collaborated on a dashboard. He told me that he learnt a way to create an in-place help page in Tableau. He first saw it at a conference somewhere and couldn't recall who the speaker was. So I am blogging here about it but the credit goes to somebody else. If anyone knows who the original creator is, leave a comment below.

The key idea is to float a semi transparent worksheet on top of the dashboard, where a help text box is strategically placed on top of each chart. This way, we can explain how to view each chart and what data points are important, etc. This worksheet is collapsible by a show/hide button. 

Below I would like to show how this worksheet can be constructed.

1. Sheet with a single data mark.

  • Double click the empty space in Marks panel and add two single quotes. Make the null pill a text label. This creates a single null mark.
  • Set the view as "Entire View"

2. Create an show/hide button

  • Go to the target dashboard
  • Drag a floating vertical container to the dashboard, making it cover all the area of interest.
  • Drag the Single Null Mark sheet and drop it into the above container. Hide the sheet title.
  • Create an open/close button for the container and place the button at the top-right corner.

3. Add annotations

  • Format the sheet background opacity as 70% in the layout manager             
  • Select area annotations and place them anywhere of interest. 
  • Write help text and format it to highlight important messages.  
  • The text can serve as functional guide and/or insight guide.

Here is an example. Feel free to download the workbook and explore. Click the "i" button at the top-right corner to view the in-place help. 

0

Add a comment

Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.