Christine suggested me to have a look at Simpson's Paradox, following my recent posts on Anscrombe's Quartet and Datasaurus Dozen. They are all about learning to look at statistics in an impartial way.

Simpson's Paradox is about the difference between the stats of an entire data set and the stats of the same data set sliced by a dimension. They can be quite different or even contradictory. We can't take one for the other.

We are going to show some visualization techniques to compare the whole vs the parts through two examples.

UC Berkeley Admission Gender Bias

The data is from here. From the campus total percentage, we see that the admission rate is 39%. Then men's rate is 45% and women's is 30%. So it seems that there is a campus wide bias against women.

#TweakThursday: From time to time I tweak someone else's public viz and try to make it better to my subjective view.

How does one use horizontal bars and vertical bars? How to order time-based multiples in a trellis chart?

Here are my own rules of thumb:

Vertical bars are for time-based trends.Horizontal bars are for categorical comparison.Always place the latest cell in a time-based trellis at the top-left corner where the focus is.

This post is about 13 data sets, known as Datasaurus Dozen, that have the same stats and different distributions. Stats can be deceiving while data visualization can makes a big difference.

Inspired by Anscombe's quartet and Alberto Cairo's Datasaurus, Justin Matejka and George Fitzmaurice crafted another 12 datasets which have the same stats and different distributions. Thus the Datasaurus and the Dozen.

Francis Anscombe, a British statistician and a professor at Princeton and Yale, constructed 4 different sets of data which all have the same stats, known as Anscombe's quartet. However the quartet's data distributions are quite different. 

Stats alone can be deceiving. Through data visualization, we can gain powerful insights into their differences. 

So, I decided to render Anscombe's quartet in Tableau. All calculations are based on Tableau's native functions.
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.