- Partition of data by quartile.
- Visual spread of each quartile
- Descriptive statistics: max, min, median, upper & lower quartiles.
Quartile is a higher level of details that allows us to understand data at a summary or aggregation level. What made box plot so popular is its simplicity. Maybe 4 is the magic/optimal number of partitions that we human can grasp the most quickly.
Box plot has been widely used in gaining insights into one dimensional data. By applying it to spatial data, we try to add one more dimension to box plot. This allows us to create a quartile-based summary view of spatial data.
Below is an example in which we applied the technique. Click image to view or download the interactive workbook.
The above is an example of visualizing the distribution of disease rates (cases/population) in various counties of California. The rates are partitioned into quartiles. Then we use the quartiles to color the map. This helps us gauge the distribution and gain instant insights into the data at a summary level.
We will give details of the calculations surrounding the application of box plot next.
The major steps for creating the chart are:
1.Drag County to the detail shelf.
2.Create a percentile ranking for Rate.
3.Create a calculated field Quartile based on the percentile.
4.Drag Quartiles to the Color shelf.
We are basically done here. Simple stuff.
We can see that the red area are of high incidents or occurrences of Chlamydia including San Francisco, San Diego and Los Angeles counties.
A few extra and optional steps can be included to help illustrate the data.
Legends
The spread of each quartile is an important feature for the viz. The legends is a good place to display the data range.
The range calculation involves some table calculations for max & min of each quartile. Here is how we calculate the max for each quartile:
The range is actually not on the legend label. Otherwise the color may change with dynamic data. So we put it in a separate table. We use bar chart to visualize the quartile spread.
Tooltips
We can put those descriptive stats in the tooltips if we wish. Here is how we calculate them:
The resulting tooltips is like:
Discrete vs Continuous Color
The above uses discrete color for the quartiles. Since the quartiles are actually partitions of a measure, we can have the option to use continuous color scheme. Then we use this formula to designate quartiles:
And this is how it looks with continuous color. The continuous color scheme is actually more intuitive! It shows the contrast between quartiles. Pick discrete or continuous color at your own discretion. Click image to view or download the interactive workbook.
Box plot as reference
You may notice that we put a box plot on the viz. It is for reference. Through action filters, we know exactly where the disease rate for each county is on the scale.
That's all.
Hi Alexander, thanks for sharing this idea. This is very helpful because Tableau does not provide a customized legend for the map as ArcGIS. I download your workbook and get try to understand the details. I kinda see that you use QuartOrder to control the label showing on the legend but don't understand how it actually works. Can you pl explain it a little bit?
ReplyDeleteThanks,
Yanning
In the above, I showed how to calculate Max per quartile. Then you can calculate Min. Max-Min = spread of the quartile. Set computing along County. Does this answer your question?
DeleteI see you have a calculated field "QuartOrder=(RANK( [QSize] )-RANK_UNIQUE([QSize])+1)" in the filter for the quartile rate. How does this actually work to control the "legend label"?
DeleteThanks for the help :)
It is explained here
Deletehttp://vizdiff.blogspot.com/2015/05/histogram-via-rank-functions.html