Box Plot is a highly effective tool for data analysis invented by the great statistician Professor John Tukey. It gives us the following things:
- Partition of data by quartile.
- Visual spread of each quartile
- Descriptive statistics: max, min, median, upper & lower quartiles.

Quartile is a higher level of details that allows us to understand data at a summary or aggregation level. What made box plot so popular is its simplicity. Maybe 4 is the magic/optimal number of partitions that we human can grasp the most quickly.

Box plot has been widely used in gaining insights into one dimensional data. By applying it to spatial data, we try to add one more dimension to box plot. This allows us to create a quartile-based summary view of spatial data.

Below is an example in which we applied the technique. Click image to view or download the interactive workbook.
The above is an example of visualizing the distribution of disease rates (cases/population) in various counties of California. The rates are partitioned into quartiles. Then we use the quartiles to color the map. This helps us gauge the distribution and gain instant insights into the data at a summary level.

We will give details of the calculations surrounding the application of box plot next. 

The major steps for creating the chart are:
1.Drag County to the detail shelf.
2.Create a percentile ranking for Rate.
3.Create a calculated field Quartile based on the percentile.
4.Drag Quartiles to the Color shelf.

We are basically done here. Simple stuff.

We can see that the red area are of high incidents or occurrences of Chlamydia including San Francisco, San Diego and Los Angeles counties.

A few extra and optional steps can be included to help illustrate the data.

Legends

The spread of each quartile is an important feature for the viz. The legends is a good place to display the data range.

The range calculation involves some table calculations for max & min of each quartile. Here is how we calculate the max for each quartile:
The range is actually not on the legend label. Otherwise the color may change with dynamic data. So we put it in a separate table. We use bar chart to visualize the quartile spread.
Tooltips

We can put those descriptive stats in the tooltips if we wish. Here is how we calculate them:
The resulting tooltips is like:
Discrete vs Continuous Color

The above uses discrete color for the quartiles. Since the quartiles are actually partitions of a measure, we can have the option to use continuous color scheme. Then we use this formula to designate quartiles:
And this is how it looks with continuous color. The continuous color scheme is actually more intuitive! It shows the contrast between quartiles. Pick discrete or continuous color at your own discretion. Click image to view or download the interactive workbook.
Box plot as reference
You may notice that we put a box plot on the viz. It is for reference. Through action filters, we know exactly where the disease rate for each county is on the scale.

That's all.
4

View comments

  1. Hi Alexander, thanks for sharing this idea. This is very helpful because Tableau does not provide a customized legend for the map as ArcGIS. I download your workbook and get try to understand the details. I kinda see that you use QuartOrder to control the label showing on the legend but don't understand how it actually works. Can you pl explain it a little bit?

    Thanks,
    Yanning

    ReplyDelete
    Replies
    1. In the above, I showed how to calculate Max per quartile. Then you can calculate Min. Max-Min = spread of the quartile. Set computing along County. Does this answer your question?

      Delete
    2. I see you have a calculated field "QuartOrder=(RANK( [QSize] )-RANK_UNIQUE([QSize])+1)" in the filter for the quartile rate. How does this actually work to control the "legend label"?

      Thanks for the help :)

      Delete
    3. It is explained here
      http://vizdiff.blogspot.com/2015/05/histogram-via-rank-functions.html

      Delete

(Refresh the page if you want to view the gif image multiple times. Or go to Tableau Public and click the button at the top-right corner.)

Jake and I collaborated on a dashboard. He told me that he learnt a way to create an in-place help page in Tableau. He first saw it at a conference somewhere and couldn't recall who the speaker was. So I am blogging here about it but the credit goes to somebody else. If anyone knows who the original creator is, leave a comment below.

The key idea is to float a semi transparent worksheet on top of the dashboard, where a help text box is strategically placed on top of each chart. This way, we can explain how to view each chart and what data points are important, etc. This worksheet is collapsible by a show/hide button. 

Below I would like to show how this worksheet can be constructed.

1. Sheet with a single data mark.

  • Double click the empty space in Marks panel and add two single quotes. Make the null pill a text label. This creates a single null mark.
  • Set the view as "Entire View"

2. Create an show/hide button

  • Go to the target dashboard
  • Drag a floating vertical container to the dashboard, making it cover all the area of interest.
  • Drag the Single Null Mark sheet and drop it into the above container. Hide the sheet title.
  • Create an open/close button for the container and place the button at the top-right corner.

3. Add annotations

  • Format the sheet background opacity as 70% in the layout manager             
  • Select area annotations and place them anywhere of interest. 
  • Write help text and format it to highlight important messages.  
  • The text can serve as functional guide and/or insight guide.

Here is an example. Feel free to download the workbook and explore. Click the "i" button at the top-right corner to view the in-place help. 

0

Add a comment

Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.