Today almost every data set in practice has got geographical attributes.

In this post, I would like to discuss a way to analyse geography-based data set by quartile partitioning. That is, we partition the data set into 4 equal-size groups on contiguous area.

The example data set will be the US Mass Shootings from 2013-2015. Note that in the analysis, we only focus on the 48 continental states of USA.

1.The Quartile Approach

This is inspired by the widely used boxplot technique. Like boxplot, we will cut a data set into 4 groups of equal size, thus the quartile approach. Boxplot works along a single dimension. We will work with a 2-dimension twist here: longitude and latitude. In 2 dimensions, there are multiple ways to cut the data set into contiguous quartiles, as will explain below.

US mass shootings have been a society problem for quite some years. Gun violence is a problem that is particular in the US where we have the best of technology and thinkers, but we don't know what to do with the guns.

Here is an effort to understand the distribution of mass shooting incidents. Mass shooting is defined as an incident where 4 or more people got shot, whether killed or wounded. The study will lead us to gain certain insights into the pattern of the problem.
1

I thought that dimension filtering and data blending are logically permutable. So that's how I drew the last diagram for the order of operations.

Zen Master Jonathan Drummey pointed out that the actual queries for dimension filtering take place before data blending. So here is a new version incorporating Jonathan's contribution, showing the precedence of dimension filters over data blending.

Within each filter categories, there could be subcategories of filters.

We have worked to understand the order of operations in Tableau at a high level.

Actually there are also filtering operations at a lower level. Here we are going to have a look into the order of operations within either of dimension filter and set filter.

A good understanding of the filtering options will let us take advantage of the versatile functionalities of the dimension and set filters. Especially, the filters within are not affected by dimensions in view.

Histogram can be created in many ways. The de facto histogram is built with bars. With Index() we can create one a bit more colorful. Click image below to see an interactive version or download the workbook. Will describe how to create this next.

This one is created in Gantt chart, overlaid with boxplot and colored by [Profit]. It is regarding the customer distribution sliced by the number of orders, based on the superstore data set.

Both Histogram and Box-n-Whisker Plot are popular tools to describe the distribution of data in different ways. They provide different insights into the distribution. It's quite interesting to overlay one with another.

Today, we will show how to put them together in one chart.

The above is an example using the superstore data set. The histogram is about the distribution of the number of products per the number of units sold. Then sliced by subcategories.
5

I have written a few blogs recently on the subject of data scaffolding. Let me make a summary of them.

Data scaffolding is a technique to artificially create a data structure for the purpose of visualization. It will either reshape the original data or blend multiple data sources in such a way for better visualization.

The technique is pioneered by Tableau Zen Master Joe Mako.

The general methodology is as follows

1.Create a table of pure dimensions to act as the primary data source.

So far, we have talked about data blending via scaffolding: 1,2,3. Blending involves 2 or more data sources.

Data reshaping is about a single data source. By scaffolding, we can alter or transform the data structure in order to create visualizations that was not straightforward using the original data set.

Again, Zen Master Joe Mako has lectured about scaffolding in an hour long video focusing on data reshaping or dealing with a single data source. He has included 4 examples.

Since my last article on this topic, there have been some discussions. The real diagram could be more complicated than what is been drawn here. The diagram here could be a gross approximation to the real one. Let's try to make it evolve towards the ultimate one.

So here are a few updates:

- added Custom SQL which is an integral part of the Tableau functions. It can be used as the first filter and transformer for the raw data.
6

Right before the 2015 Tableau Conference, Rody Zakovich of the Tableau community forum initiated a series of appreciation pieces to a select group of members, including me:

Community Appreciation - Alexander Mou

I am very much moved. It meant a lot to me. I have been in the forum for over a year and I am greatly impressed by the amazing energy and camaraderie in the community. The people there are great and friendly. They can help you solve any Tableau problem.
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.