Right before the 2015 Tableau Conference, Rody Zakovich of the Tableau community forum initiated a series of appreciation pieces to a select group of members, including me:
Community Appreciation - Alexander Mou
I am very much moved. It meant a lot to me. I have been in the forum for over a year and I am greatly impressed by the amazing energy and camaraderie in the community. The people there are great and friendly. They can help you solve any Tableau problem. They kind of act as the first line technical support on behalf of Tableau Inc. I leant a great deal by helping others as well.
As my Tableau skills got better, I started answering questions from other members. I became more active this year. Tracy Fitzgerald has been writing a weekly digest for the community. From those digests, I am able to find my progress route:
Community Digest - Ending 3/16: 25 correct answers and Questions for Breakfast badge
Community Digest - Ending 4/6: 50 correct answers and Question Slayer badge
Community Digest - Ending 6/8: 100 correct answers and Answer Wizard badge which seems to be the highest distinction for a forum member.
I keep working on the problems posted in the forum. In a good week, I can even get 10 correct answers:
Community Digest - Ending 6/29
The forum has become a great inspiration for my blog. You might have noticed that I referred to the forum posts quite often.
And one day I was looking for the Crow's Nest (a place where questions got zero answer), I discovered by chance that I was at the top of the leaderboard. The ranking is based how many people I have helped during the past 12 months. I may disappear from the dynamic board if other people do more than I do. This is a screenshot I took today on 10/25/2015.
Below is the note I wrote at the forum in response to the community appreciation:
Rody, Mark and Simon, thanks for the kind words! I am so touched.
Actually I probably learnt more from people in this forum than I put into it. It is such a vibrant community to which I am very grateful. There are so many interesting problems to solve. It is kind of addictive.
Yes, I have answered a few posts in Chinese, Korean, Japanese, French, Portuguese, Spanish etc, which are less likely to get an answer. As long as people speak Tableau, the lingua franca of this forum, they deserve an answer. Many non-native English speakers may not write in English. But they most likely can read English. I don't speak most of their languages. Armed with Google Translate, I can figure out most of what they intend to say. Patrick Van Der Hyde and Diego Medrano do that all the time. Anyone here can do it too. English not being my native language, I apologize myself to her Majesty if I misused or abused Queen's English. BTW, her Majesty must be really proud if she knew that a contingent of British Empire is colonizing the leader board of the new found territory called Tableau World.
I would encourage anyone here to start a blog. Many questions are recurring ones or following some kind of pattern. They are déjà vu. A quick reply would be a pointer to some blog posts. In other words, blogs make knowledge scalable. Otherwise, we have to answer questions one at a time. Sometimes do so repetitively. Given that Tableau has become a victim of its own success, more questions will inundate the forum.
Also, organizing information creates value, not only to the general knowledge, but also to the blogger himself. For me, I write therefore I learn. If I can't write it, I don't understand it, at least not to the degree I would like to be.
There are many people I am very grateful to in the Tableau community where diverse yet like-minded people gathered and helped each other to learn and to create. In particular, I would express my appreciation to Andy Kriebel whose blog is the first one I read. It opened my eyes to the fantastic world of data visualization. Also I would thank Jonathan Drummey, Joshua Milligan and Mark Jackson from whose blogs I learnt a great deal. Last but not the least, I would thank Joe Mako for his generosity of solving whatever problems that were too difficult for me and offering to have screen share sessions. As I have found out along the way, many of the techniques such as scaffolding, data densification etc, can all be traced back to Joe, the source.
Keep learning and have fun! Hope to meet you at TC15.
-
[Update: There are a few new updates on this topic in the comment section.]
Initially Tableau published this diagram on the Order of Operations in a white paper.
The context filters are actually regular filters that we have manually assigned higher priority in execution. On the other hand, all filters provide context for the filters that follow.
- Data Blending
- Forecast
- Annotations, Reference Lines, Trend Lines, Tooltips, Pages, Titles
Now I try to integrate some of Jen's elements into a new diagram as follows. Also added primary and secondary data paths. Context filters can only be applied to the primary.
I couldn't find my original PowerPoint file in which I drew my first Precedence of Filtering diagram. And I found that using PPT to draw is a bit tedious. It's not easy to align text and may not be easy to update.
So I decided to draw the diagram in Tableau. And voila the result. Tableau can easily handle the drawing and labeling as well as tooltips. I no longer needed to struggle to align the text here and there. The diagram is based on Gantt chart.
Hope this new diagram helps you get a better understanding of Tableau's inner working.
9View comments
-
Scaffolding is a way to blend data. In quite a few cases, it gets the job done quite well. Thus there seems no more need to join or union data at the record level.
In recent articles, I described "Lossless Data Blending via Scaffold" and "Blending Dates via Scaffold". Both are in their simplest form: One dimensional scaffold to blend two data sources of different dimension components. The scaffolding dimension must be the superset of those same dimensions in the secondary data sources.
Again, I would emphasize that the difference between regular blending and scaffold-based blending is:
- Blending: Loss of data in the secondary sources.
- Scaffolding: No loss of data if we wish. Or we can choose to keep only those data of interest. The scaffold acts as the primary. All actual data sources are equally secondary.
The short answer is, we need to build the filtering dimensions into the scaffolding first. Then we can create the chart and filter the result afterwards.
There comes the multi-dimensional scaffolding. And the detailed answer follows.
Let's take the same example as in "Taking Stock with Start and End Dates". Assume we need to filter the result by Product Category and Customer Segment.
In that example, we created a single date dimension scaffolding. Now we need to add two more dimensions. The steps are as follows.
1.Create one column per dimension per sheet in Excel
So we have these 3 sheets friendly named: Date, Product Category and Customer Segment. But they could be using the default names like Sheet1, Sheet2 and Sheet3. Each sheet has a single column with header and dimension elements.
2.Cross join all the dimensions using custom SQL
A SQL one-liner suffices to generate the multi dimensional scaffolding
Select * from [Date$],[Product Category$],[Customer Segment$]
There are 2 elements in Date: Start date and End date. There are 3 elements in Product Category and 4 elements in Customer Segments. Cross joining them will generate 2x3x4=24 combinations thus 24 rows in the scaffolding.
The size of the scaffolding equals to the multiplication of the sizes of each dimension.
The next step is to make sure all the secondary data sources are blending with the primary on all 3 dimensions.
Last, by creating the same measure "Outstanding Orders" and dragging Customer Segment and Product Category to the filter shelf, we now can filter the measure and associated chart by the two dimensions.
The resulting interactive workbook can be downloaded here.
Dimension Reduction
We see that the scaffolding is created using 3 dimensions. The size of the scaffold or the number of rows are obtained by multiplying the sizes of each dimension. This number can become huge if a few of them are big. Sometimes, such a huge and bulky scaffolding is unnecessary because it takes up space and decreases performance. So we need to do some dimension reduction.
For example, in our superstore data set (depending on versions), there are 3 product categories and 17 sub-categories. If we want to filter by these two dimensions, according to the above, we seem to need 3x17=51 rows of scaffolding. This is assuming the two dimensions are orthogonal. In reality, they are not. Each category is just a label on the 17 sub-categories. And each sub-category belongs to one category only. So these two dimensions can be put in one sheet. Thus the size of the scaffolding is reduced from 51 to 17. If necessary, this sheet can be cross-joined with other dimensions.
This is how multi-dimensional scaffolding works! It can help us blend multiple data sources and build dimension filters in a very flexible way. This actually creates alternatives to union or join at the record level.8View comments
-
Histogram is one of the most widely used analysis methods of all time. It is my personal favorite topic to write about. In Tableau, there are a variety of ways to create it. I have to say here, Tableau is quite versatile. A good language can be used to describe an object, a scene, a feeling in more ways than one can imagine. Tableau is such a language for data visualization.
There are two steps in creating histogram:
- Create bins
- Count occurrences per bin
The concept is very simple. But the implementation may not be. To me, creating histogram in various ways is a great learning experience of Tableau. It might be the same for you if you go through some of the articles here.
Here I would put all my articles in one place for easy retrieval and reference. Will update the list if I find something new to write.
Histogram via Index()
Overlaying Histogram with Box and Whisker Plot
Histogram via Running Count
Creating Bins via Math Formula
Creating Bins Made Easy
Histogram via LOD
Histogram via Size()
Histogram via Rank Functions
Dynamic Histogram Over Time
Binning Aggregates for Histogram and Aggregate Distributions
Histograms on Aggregated Measures without Table Calculations
I would also include links to other resources on the same topic. Let me know if you spot a great article on Histogram in Tableau.
Jonathon Drummey's histogram collection
http://drawingwithnumbers.artisart.org/tag/histograms/
Joshua Milligan
Slicing By Aggregate
Ben Young
Grouping Bins on a Histogram
Ben Jones
How to Make a Scatterplot with Marginal Histograms in Tableau
Tableau Inc
Creating a Histogram with Binned Data
Histograms training video
When to Use Histograms
Breaking BI
Creating Histograms in Tableau
The Information Lab
Show Me How: Histograms
Interworks
Simple Histograms In Tableau
Oldsynner
Tableau Public 8 (1) histograms
Data Champions
How to Create Bins in Tableau
Tech Analytics
Histograms Tableau Advanced Visualization0Add a comment
-
During a screen share session with Tableau Zen Master Joe Mako, he showed this method of calculating histogram. Again, I am going to describe it here. This is the latest article on histogram, although I have written many on the same subject.
Will use an example I used before: Counting the number of Premier League clubs per facility fee level. There are 7 levels of facility fees awarded to 20 clubs according to the number of matches broadcast on TV.
The main purpose here is to illustrate the method, instead of seeking the optimal one. One of the keys in this method is partitioning. It can be used in any kind of bins, especially those bins created on aggregated measures where advanced table calculation is not available.
The steps are as follows:
1.Binning the clubs
There are 7 levels of facility fees. Each level is a bin with a label. Rank_Dense() is used to create labels for the bins.
2.Sort the clubs and running-count the number of clubs per bin
Sort the clubs by facility fees and we make sure that the clubs with the same bin are next to each other. Without sort first, the running count won't work.
Then we are able to perform running count within each bin. The Running Count formula is as follows:
- IF [Bin Label] != LOOKUP([Bin Label],-1) THEN 1
- ELSE 1+PREVIOUS_VALUE(0)
- END
3.Get the highest count per bin
The formula for Count is as follows. This table calculation is set to compute using Club.
- IF [Bin Label]<>IFNULL(LOOKUP([Bin Label],1),'')
- THEN [Running Count]
- END
4.Filter the null values
For visualization, we need to filter the nulls in Count in which only the last running count is not null.
Voila here is the histogram we expected. The workbook can be downloaded from here.
2View comments
-
This is a sequel to Lossless Data Blending via Scaffolding. The blending here is also lossless.
Date dimension is one of the most important dimensions in data structure. Most data are transactional ones with a timestamp attached to each transaction record. Needless to say, time series analysis is one of the most important analysis methods.
Let's get into the problem statement based on a simple example: two stores have their sales tallied in two tables (two data sources). The headquarter wants to view the overall sales along time.
Here are the very simplistic reports from the two stores:
The dates in the two stores are non-inclusive sets. Let's see how to use scaffold to blend the two tables together and create a single chart to view the overall sales.
Scaffolding
As we mentioned in Lossless Data Blending via Scaffolding, we need to build a superset of the dates in both tables. For date dimension it is much easier. It is a two-row table with the min date and the max date of combined set of dates in the two tables:
Note that the Scaffold Primary vs Secondary
Drag the Date to Columns shelf and turn on "Show Missing Values" in Scaffold's date pill, we will get all the dates in between. This is the KEY in blending date dimension!!
2.Blend Dates
Make sure all date dimensions from the secondary sources are linked with the primary.
3.Use ZN() to aggregate
We then create a measure Total Sales as
- ZN(SUM([Store A].[Sales]))+ZN( SUM([Store B].[Sales]))
Scaffolding with Parametric DatesData is dynamic. Especially the end date may not be fixed. We need to change the end date from time to time. It would be hard to change the static scaffold above.We can use parameters for dates. Thus a parameter's default value and maximum value can be modified easily.Here is the scaffold:The we create a calculated field to set up Date dimension:- Case [Record]
- WHEN 1 then [Start Date]
- WHEN 2 then [End Date]
- END
Here is the view of the resulting dashboard. Click it to go the interactive version.This technique has been applied in this calculation:
Taking Stock with Start and End Dates0Add a comment
Add a comment