Vizible Difference: March 2015

Mar

23

Tableau at Twitter

Last Thursday on 3/19/2015, I went to attend San Francisco Bay Area Tableau User Group's meetup, which was hosted by Twitter.

It was a great event enlisting 4 excellent speakers, including Tableau luminaries like Allan Walker and Anya A'Hearn, a.k.a, Zen Masters, MCed impeccably by Ashley Ohmann.

Dan Seisun of Twitter made a great presentation on the infrastructure of data visualization using Tableau at the company. He shared invaluable best practices of running tableau servers, educating users, instituting Tableau champions and making corporate BI as self services.

I was able to capture his presentation on my phone, because the fonts were big enough seen from where I sat. Though succinct, not every one of his slides contains less than 160 characters.

Below are the photos packaged in a viz. It's a bit slow to view because the pictures are quite big. Just be patient. BTW, I got to know that a viz's maximum height is 4000 pixels. So, I had to leave out a few slides.

The only regret of the day, if any, was that I didn't send a tweet while at Twitter's headquarter.

Posted 23rd March 2015 by Alexander Mou

0
Add a comment
Mar

21

The Nuance in Top N

If you think you understand what Top N means, think again. Here is a little story.

Boss: This year we have done really well. We are going to send our top 10 sales guys to the Moon. Give me the list.

Analyst RM: After running my Tableau, I can only find 5 guys in the top 10.
Boss: What? You are fired.

Analyst R: I found 11 guys in the top 10.
Boss: Are you serious? We do not have budget for one more person. You are fired.

Analyst RD: There are 20 people in the top 10.
Boss: Are you out of your mind? You are fired.

Analyst RU: I got exactly 10 guys in the top 10.

RU keeps his job at the expense of some sales guys who did just as well as those who made his top 10.

The full names of the analysts are as follows:
RM: Rank_Modified()
R: Rank()
RD: Rank_Dense()
RU: Rank_Unique()

The moral of the story is that Top N can mean different things. It really depends on which Tableau rank function is used.

In terms of the number of records in Top N, the rule of thumb is

Rank_Danse >= Rank >= N >= Rank_Modified
and
consistently, Rank_Unique = N

See the picture below as the illustration of how the story could happen. The workbook can be found here. An earlier review of all the rank functions was done from a different perspective: Part 1 and Part 2.

Posted 21st March 2015 by Alexander Mou

4
View comments
Mar

16

Selected as Best of Tableau Web ... Feb 2015

One of my post Dual Axis: To Be or Not To Be is selected as Best of The Tableau Web ... Feb 2015. Thanks to the Tableau community for the inspirations and the recognition!

Previous recognitions include:

Best of The Tableau Web in January 2015:
Embed Youtube in Tableau with Full Screen

Viz of the Day on Dec 19, 2014:
Turing Award winners 1966-2013

Best of the Tableau Web in October 2014:
Color your way to visual finesse

Best of The Tableau Web in August 2014:
Revisit of 5 rank functions in Tableau

Posted 16th March 2015 by Alexander Mou

0
Add a comment
Mar

15

Grouping Lower % Bars in Bar Chart
After finishing Grouping Lower % Slices in Pie Chart, I feel it is necessary to write a companion post about the bar chart counterpart. Bar chart is still the most popular chart type of all time. Tableau Zen Master Joe Mako wrote a post on grouping low ranked bars into one, which is great inspiration to me.

I will use the same data in pie chart example which makes things easier. The statement of the problem is how to group the lower % bars into a single one after a sorted measure. In the example, we need to group those lower % sales countries into a single label: "Others".

Create the Bar Chart

1.Create a parameter [N% Chooser] which allows us to choose any % as threshold. Show the parameter control. One can skip this if the percentage is a given number.

2.Create a new field [Top % + Others Label] which labels all low % slices as "Others". Drag it to the Row shelf. Note that the percentage is dynamically controlled by the parameter 0.01*[N% Chooser]. One can replace it by a given number if necessary.

3.Drag the [Country/Region] to the Detail shelf.

4.Create a new field [Top % + Others Sales] which sums up the sales of all Others into one value. We will explain the logic later. Drag it to the Column Shelf. Right click the pill and select Compute Using>Country/Region.

Now we got a barebone bar chart with an "Others" bar. Dressup is in order and will be done next.

Dress Up

- Click the descending sort button to sort the bars.

- Click the "42 nulls" at the down-right corner and you will see a pop up. Click Filter Data to get rid of the Nulls. A green pill will appear in the filter shelf.

- Create [Percent] field and drag it to the Label shelf. Right click on it and select Format to make the numbers in %.
- [Top % + Others Sales]/Total(Sum([Sales]))
Now we are essentially done. Further cleanup is recommended:

- Create a [Is Top %] field as follows. Drag it to the Row shelf and place it before [Top % + Others Label]. In the table, drag True to above False. Then right click it to uncheck Show Header.
- [Top % + Others Labels] !="Others"
- Drag [Is Top %] to the Color shelf. Now the "Others" bar is at the bottom of the chart with a distinct color.

- Right click the horizontal axis and uncheck "Show Header".

- Right click the canvas and select Format. Go the grid menu and turn both Row and Column Divider to None. This will remove some unnecessary horizontal and vertical lines.

And it's done! The workbook can be found here.

Calculating [Top % + Others Sales] and Filtering

To help understand the calculation, we attached a table in the above workbook. The table values are sorted descendingly according to sales. We see that all the lower % countries are labeled with "Others". And only the last country in "Others" has values. The rest of countries have Nulls.

The calculation is as follows:

if [Top % + Others Labels]="Others"
then if Last()=0
then Window_Sum(if [Top % + Others Labels]="Others" then Sum([Sales]) end)
end
else SUM( [Sales] )
end

It shows that, for the new label "Others", we need a different calculation. Otherwise it's just Sum([Sales]).

Widow_Sum() will only sum up the sales of those countries labeled with "Others".

We only need one value which sums up all the "Others". Last()=0 will do that. The rest is filtered out.

Voila!

Top N & Grouping Others Series

Grouping Lower Ranked Bars in Bar Chart

Grouping Lower Ranked Slices in Pie Chart

Grouping Lower % Bars in Bar Chart

Grouping Lower % Slices in Pie Chart

Dynamic Pie Chart: Top N, Sorting Slices and Grouping Others
Posted 15th March 2015 by Alexander Mou

3
View comments
Mar

15

Grouping Lower % Slices in Pie Chart
[This post is long over due. The Super Pi Day reminds me of it and thus this Pie Chart post is dedicated to the Day!]

[A sequel has been written on Grouping Lower % Bars in Bar Chart It explains the calculation from a slightly different perspective.]

A post of couple of months old in Tableau Forum has re-surfaced thanks to Patrick Van Der Hyde's effort, asking how to group lower % slices into a single one in a pie chart.

Having worked on a similar project grouping lower ranked slices before, I thought this is easy. This time, it's about grouping lower % slices. And it takes a few more steps.

The purpose in this example is to use pie chart to view the top % countries in terms of sales and group the lower % slices into one. Let's get started.

Build the Pie

1.Select the Pie chart type and make the working canvas Entire View.

2.Create a parameter [N% Chooser] which allows us to choose any % as threshold. One can skip this if the percentage is a given number.

3.Drag the [Country/Region] to the Detail shelf. Right click it and sort it by Sum([Sales]) descendingly.

4.Create a new field [Top % + Others Label] which labels all low % slices as "Others". Drag it to the Color shelf. Note that the percentage is dynamically controlled by the parameter 0.01*[N% Chooser]. One replace it by a given number if necessary.

5.Create a new field [Top % + Others Sales] which sums up the sales of all Others into one value. We will explain the logic later. For now, drag it to the Angle shelf. Right click the pill and select Compute Using>Country/Region.

Now we have a barebone pie chart that groups the lower % !

Dress up

We need to add labels to the pie.

- Drag [Top % + Others Label] to the Label shelf.
We see that there are more than one slices labeled as Others. We only need one label and need to filter the rest.

- Drag [Top % + Others Sales] to the Filter shelf. When prompted with filter settings, click OK.

- Right click the filter pill and select Compute Using>Country/Region. When filter settings pops up again, click Special tab and select Non-null values.

- Create [Percent] field and drag it to the Label shelf. Right click on it and select Format to make the numbers in %.
- [Top % + Others Sales]/Total(Sum([Sales]))
Now we are essentially done. Further cleanup is recommended:
- Click the Size shelf and adjust the size of the pie.

- Create a [Blank] field with "" and drag it to both Row and Column shelves. Then right click it and uncheck Show Header. This will remove the headers and leave the canvas clean.

- Right click the canvas and select Format. Go the grid menu and turn Row Divider to None. This will remove some unnecessary horizontal lines.

Now we have a perfect pie chart that groups the lower % slices into one, dynamically with the parameter! The workbook is here.

Calculating [Top % + Others Sales] and Filtering

The calculation is as follows:

if [Top % + Others Labels]="Others"
then if Last()=0
then Window_Sum(if [Top % + Others Labels]="Others" then Sum([Sales]) end)
end
else SUM( [Sales] )
end

It shows that, for the new label "Others", we need a different calculation. Otherwise it's just Sum([Sales]).

Widow_Sum() will only take into account the sales of those labeled with "Others".

There are likely multiple slices labeled as "Others". We only need one. Last()=0 will do that. The sales figures for the rest will be Null. This will be used in the filter to exclude them.

PS. Pie chart has met with great resistance in the visualization community. One of the problems is that it looks messy and cluttered when there are too many slices. Still, people are attracted to circle shapes more than others. Hope this tutorial will help creating a cleaner and more appealing pie chart and relieve the pain at the viewer side.

Top N & Grouping Others Series

Grouping Lower Ranked Bars in Bar Chart

Grouping Lower Ranked Slices in Pie Chart

Grouping Lower % Bars in Bar Chart

Grouping Lower % Slices in Pie Chart

Dynamic Pie Chart: Top N, Sorting Slices and Grouping Others
Posted 15th March 2015 by Alexander Mou

0
Add a comment
Mar

13

Coloring Logarithmically for Optimal Human Perception
A recent post by James Eichinger inquired about the best practices of manipulating color values because his viz is swamped by high intensity colors. A solution is mentioned by David Walp and Joshua Miligan as logarithmic transformation:
- Use natural logarithm LN([Measure]) for coloring, instead of [Measure].
Joshua even had a great article about its use in mapping.

Here is an example I produced using the Superstore data to visualize sales by city. You can notice the stark difference between the two coloring schemes. The workbook can be found here.

This reminds me of the human perception theory that we see things in a logarithmic way. The earliest scientific discovery regarding this topic is by Greek astronomer Hipparchus, some 2000 years ago when studying the relationship between star brightness and human perception. In the 19th century, a formal theory is created as Weber-Fechner Law.

From Wikipedia:

"The law states that the just-noticeable difference between two stimuli, is proportional to the magnitude of the stimuli, (and the subject's sensitivity), i.e. if you sense a change in weight of .5 lbs on a 5 pound dumbbell, you ought to feel the extra pound added to a 10 pound dumbbell."

That is, the human sensation/perception difference ∆P to weight W or to any external stimulus is proportional to ∆W/W instead of to ∆W, namely,
- ∆P=K(∆W/W)
Through integration we get
- P=K*LnW + C
K and C being constants.

There are more recent articles on this ever-fascinating topic in Quora, Huffington Post and IEEE Spectrum.

In star gazing, the stellar brightness is attenuated logarithmically by the sheer distance. It's the same for sound during its propagation. For data visualization, it happens on computer screen which is very close to our eyes. The force of nature does very little to attenuate the screen light. My postulation in this case is that we see light intensity as it is, much in a linear way, rather than logarithmic. To mimic the force of nature for the best human perception, we can mathematically transform the color intensity by ln().

It applies best when there is a wide range of color intensities, in which case low intensity colors tend to be obscured by high intensity counterparts if we linearly scale the color. By applying the logarithmic transformation, we curbed the high intensity and see more low intensity colors. It gives us more insights into the shade, at the expense of non-differentiation of high intensity colors.

It is a great tool with limitations. My suggestion is, use it with care and without misleading viewers by labeling clearly.

BTW, I proposed an idea to Tableau to include logarithmic transformation as an option in color editor. Please vote it up if you like it.

PS. Perception is everything, except being linear. It is logarithmic.
Posted 13th March 2015 by Alexander Mou

0
Add a comment
Mar

12

Nested Sorting and Top N per Category via Rank_Unique

This task can be implemented using Index() as documented by many. Here I am going to present an alternative solution based on Rank_Unique(). I found it a little easier than using Index().

Let's use the classic Superstore example. We want to get Top N Categories per Department in terms of Sales.

Nested Sorting

Step 1.

Drag Department and Category to the Row shelf.

Step2.

Create a calculated field Rank_Unique: Rank_Unique(Sum([Sales])) .

Step 3.

Drag it to the Row shelf and turn it into Discrete. Move it to between Department and Category.

Step 4.

Right click on Rank_Unique and select Compute Using>Category

Now, the Nested Sorting is achieved without [Sales] in view! Yeah. We don't really need it if all we want is the ranking of categories per department.

Top N Filter
Next is to set up the Top N filter.

Step 5.
Drag the green Continuous pill Rank_Unique to the Filter shelf. Click OK when the filter settings pops up.

Step 6.
Repeat Step 4, for the filter pill. Select At Most for the filter settings when prompted again.

Step 7.
Right click Rank_Unique and show quick filter. And we have a sliding bar for Top N.

Voila, we are done!

An alternative is creating a Rank_Unique<=N filter. I will leave that as an exercise.

Dress Up

The above is the minimum barebone version of nested sorting and top N per category viz. To dress it up, we may need to do the following.

- To see the visuals and values, drag [Sales] to both Column shelf and Label shelf.

- Hide the rankings by unchecking Show Header under Rank_Unique.

- Edit the filter title to be Top N.
- Drag Department to Color Shelf to add some color.

The workbook can be downloaded here.

Bottom N

Given that we know how to get the top N per category, we only need to replace
Rank_Unique(Sum(Sales))
by

Rank_Unique(-Sum(Sales))
Following the same steps, we get the bottom N per category. Make sure the table calculation is always Compute Using>Category.

Both Top and Bottom N

Create a filter as follows:

Rank_Unique(Sum(Sales)) <=N or Rank_Unique(-Sum(Sales)) <=N

Drag it to the filter shelf and fix the table calculation. Then it's done.
A more detailed post on this can be found here
One Line of Code to Get Both Top N & Bottom N

Why not Rank()?

Rank() function produces non-distinct rankings as well as Rank_Modified() and Rank_Dense() do. Thus sometimes we may not get exactly N elements using a Rank()<=N filter. Rank_Unique() returns distinct rankings even for identical values. If you expect 10 elements consistently from a Top 10 filter, instead of 9 or 11, Rank_Unique is the way to go.

A review of rank functions in Tableau is here.

Posted 12th March 2015 by Alexander Mou

3
View comments

Vizible Difference

In search of data visualization excellence

Tableau at Twitter

Add a comment

The Nuance in Top N

View comments

Selected as Best of Tableau Web ... Feb 2015

Add a comment

Grouping Lower % Bars in Bar Chart

View comments

Grouping Lower % Slices in Pie Chart

Add a comment

Coloring Logarithmically for Optimal Human Perception

Add a comment

Nested Sorting and Top N per Category via Rank_Unique

View comments