1. Histogram can be created in many ways. The de facto histogram is built with bars. With Index() we can create one a bit more colorful. Click image below to see an interactive version or download the workbook. Will describe how to create this next.
    This one is created in Gantt chart, overlaid with boxplot and colored by [Profit]. It is regarding the customer distribution sliced by the number of orders, based on the superstore data set.

    The important steps to create the chart are:

    1.Create a calc field [CountD of Order] and move it to Columns. Turn it to continuous.
    2.Drag [Customer ID] to Detail shelf.
    3.Create Index=index() and move it to Rows. Set it to compute use [Customer ID].
    4.Select Gantt chart in Marks.
    5.Drag [Profit] to Color shelf.
    Now the Gantt marks are colored but not sorted.
    6.Right click Index and select Edit Table Calculation. In the advanced settings, set the sorting by Sum(Profit).
    7.Turn on boxplot either by adding reference line or dragging from the analytics tab.
    8.Add labels to the chart by holding CTRL key and drag the Index pill to the Label shelf. This makes a copy of the pill.

    Now we got an alternative histogram that tells us the distribution per number of orders, variance, profitability. Note that, with every mark out there, one can drill down to the gory details.



    0

    Add a comment

  2. Both Histogram and Box-n-Whisker Plot are popular tools to describe the distribution of data in different ways. They provide different insights into the distribution. It's quite interesting to overlay one with another.

    Today, we will show how to put them together in one chart.
    The above is an example using the superstore data set. The histogram is about the distribution of the number of products per the number of units sold. Then sliced by subcategories.

    The histogram is built via Size() according to an approach described in an earlier article. The difference here is that we are using LOD to calculate the number of orders per customer:
    • [Units Sold] = {FIXED [Product ID]: SUM(Number of Records)}
    The advantage of LOD expression is that it can be both a dimension and a continuous pill. A continuous axis will make a real histogram. (See Jonathan Drummey's comments)

    With a continuous axis, we can create a box plot!

    So it's pretty simple. Based on Histogram via Size() approach, we can create a histogram on the distribution of customers (Product ID as dimension).

    The marks are chosen to be stacked. Note that it is possible to minimize the number of marks in the chart. But we can't filter the nulls. Otherwise the box plot stats won't be correct.

    The interesting insights we get are:
    1.The distribution of the number of products over the number of units sold.
    2.The median and quartiles over the number of units sold

    The above idea came when I played with box plot over jitters as shown in this blog. The jitters are visually appealing. It shows the sample density distribution in a visual way, which is much like a histogram, but not quantified. I found that the dots can be organized as a histogram.

    The jitters are generated using Index(). We can also use index() to create a histogram.

    Voila, yet another addition to the series of histogram charting.

    PS.The bar chart doesn't have to be a histogram. It can be another measure. Here is the average product price over box plot.

    5

    View comments

  3. I have written a few blogs recently on the subject of data scaffolding. Let me make a summary of them.

    Data scaffolding is a technique to artificially create a data structure for the purpose of visualization. It will either reshape the original data or blend multiple data sources in such a way for better visualization.

    The technique is pioneered by Tableau Zen Master Joe Mako.

    The general methodology is as follows
    1.Create a table of pure dimensions to act as the primary data source. It sets up the structure necessary for visualization.
    2.Create measures from secondary data sources by blending.

    There are two major use cases for scaffolding: Data Reshaping and Data Blending

    Data Reshaping: single data source
    The original data structure is altered through the scaffolding, such as unpivoted, for better computation or better visualization.

    Scaffolding Video Lecture by Joe Mako
    Data Reshaping via Scaffolding

    Data Blending: multiple data sources
    In the regular blending, there could be data loss in secondary data sources, because it's kind of like a left join.

    In many cases, no one data source is more primary or secondary than the other sources. Thus we need a third party to act as the primary to all the actual data sources. This third party is a data scaffolding that is created artificially.

    Blending Data Via Multi-Dimensional Scaffolding
    Lossless Data Blending via Scaffolding
    Blending Dates via Scaffolding

    Examples of Scaffolding around the web:
    Facebook Jeopardy: Create a Single Sheet Waterfall Chart in Tableau
    GOOGLE ANALYTICS IN TABLEAU: BLENDING DATA FROM MULTIPLE ACCOUNTS
    Basic Monte Carlo Simulations in Tableau


    0

    Add a comment

  4. So far, we have talked about data blending via scaffolding: 1,2,3. Blending involves 2 or more data sources.

    Data reshaping is about a single data source. By scaffolding, we can alter or transform the data structure in order to create visualizations that was not straightforward using the original data set.

    Again, Zen Master Joe Mako has lectured about scaffolding in an hour long video focusing on data reshaping or dealing with a single data source. He has included 4 examples. Here we are going to include 2 more examples.

    Why alter the data structure? Because Tableau requires the data set to be in certain structure to be rendered as charts and tables. In other words, we need the right dimensions which can generate the required number of marks.

    Example 1. Data Padding Via Scaffold
    In this use case, some manager positions are missing in some regions. The rule is to make the assistant manager to be the acting manager. Let's see how this can be done.

    So the original data set is as follows
    We can see that some of the manager positions are missing. Not all regions have managers. The desired result is as follows:
    Let's create a scaffolding which corresponds to the desired structure:
    It has 2 dimensions and no measures. All we need to do is to fill the values in this scaffold.

    1.Create the scaffolding table in Excel. (Here we have a small table with only 2 dimensions. If more dimensions, see this article for creating multidimensional scaffolding.)

    2.Import this table to Tableau as a data source. It will be used as the primary data source.

    3.Blend the data set with this scaffolding table.

    4.Create a calc field [Employee] in the scaffold, which is a reference to the secondary data set.

    5.Create a calc field [Emp++] in the scaffold:
    • IF ISNULL([Employee]) and ATTR( [Title] )='Manager'
    • THEN WINDOW_MAX( [Employee] )
    • ELSE [Employee]
    • END
    6.Drag the [Emp++] to the rows shelf and set the table calculation to advanced mode. Set the partition to be Region and move the rest to addressing.
    Thus we get the result as expected. The workbook can be downloaded here.

    Example 2. Creating a Dimension for Measure Names
    It is often desirable that we can use [Measure Names] as a dimension. But we can't. Then scaffolding comes to the rescue. Here is a real life question that Joe Mako helped in solving the puzzle via scaffolding.

    The data set is as follows. It is a survey on 3 questions which respondents will answer with yes or no. There are a number of respondents during 4/1/2015 and 5/23/2015.
    The goal is to tally the percentage of yes or no per day per each survey question like this:
    In the original data set, each question is a column with attribute. There are 3 questions. What we need is a single [Survey] dimension that contains the 3 questions with [Answer] like yes or no. So we build a scaffolding like:
    Then we need to build measures like Count and Count % per question per day.
    1.Date is created through [Record] with 2 records, that allow us to create Start Date and End Date via 2 parameters. Through Show Missing Values, we get all the dates between the 2 dates. This will allow us to view data per day, per week, per month etc.

    2.Create [Answer (Fill)] and [Survey (Fill)]
    These two new fields are going to fill every data marks in the table. Total() function is a way to densify the data to where there were no data marks before.
    • Answer (Fill) = Total(Max(Answer))
    • Survey(Fill) = Total(Max(Survey))
    3.Create measures Count and Count %
    Set them to compute using Date.
    4.Thus we get our chart and table to visualize the survey result per day.

    So you see how scaffolding is used to create the table and chart! The workbook can be downloaded from here.
    0

    Add a comment

  5. Since my last article on this topic, there have been some discussions. The real diagram could be more complicated than what is been drawn here. The diagram here could be a gross approximation to the real one. Let's try to make it evolve towards the ultimate one.

    So here are a few updates:

    - added Custom SQL which is an integral part of the Tableau functions. It can be used as the first filter and transformer for the raw data.
    - added context filter for the secondary data source before blending.
    - moved blending to be after context filters.

    I would like to point out that the dimension filters have two genres:
    - Linking dimension filters will operate on both primary and secondary data source.
    - Non-linking dimension filters will operate on their respective data source.

    Linking dimension filters work after blending.

    Non-linking dimension filters seem to work before blending. But as long as they only filter their respective data source with or without blending, the order of operations between blending and non-linking dimension filters doesn't matter . We can thus assume all dimension filters operate after blending.

    So here is the new order of operations diagram drawn in Tableau. Click to see the viz.
    6

    View comments

Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.