Today, we will show how to put them together in one chart.
The histogram is built via Size() according to an approach described in an earlier article. The difference here is that we are using LOD to calculate the number of orders per customer:
- [Units Sold] = {FIXED [Product ID]: SUM(Number of Records)}
With a continuous axis, we can create a box plot!
So it's pretty simple. Based on Histogram via Size() approach, we can create a histogram on the distribution of customers (Product ID as dimension).
The marks are chosen to be stacked. Note that it is possible to minimize the number of marks in the chart. But we can't filter the nulls. Otherwise the box plot stats won't be correct.
1.The distribution of the number of products over the number of units sold.
2.The median and quartiles over the number of units sold
The above idea came when I played with box plot over jitters as shown in this blog. The jitters are visually appealing. It shows the sample density distribution in a visual way, which is much like a histogram, but not quantified. I found that the dots can be organized as a histogram.
The jitters are generated using Index(). We can also use index() to create a histogram.
PS.The bar chart doesn't have to be a histogram. It can be another measure. Here is the average product price over box plot.


As a use case, just added boxplots to Shine's viz https://twitter.com/vizshine/status/642707876348317696
ReplyDeleteI would like to point out that the each bar in the histogram has multiple marks stacking up. See this on marks in histogram http://vizdiff.blogspot.com/2015/05/histogram-via-size.html
ReplyDeleteBy lowering the color transparency, we make the stacked marks exhibit varying hue intensity by the number of data samples. It becomes a heatmap. Vertically, the scale is different for each state. But the hue will give a hint on the difference in the number of samples.
This does not actually represent the distribution of the customers. Notice that the middle of the box plot is always on the center bar, no matter how skewed the distribution is. By default reference lines are computed on the aggregate values, not the underlying values.
ReplyDeleteThanks for pointing out the errors in the initial examples. Just re-created the example.
DeleteClick image to see the viz.
ReplyDelete