1. Recently Tableau has published a white paper on LOD or level of details computation. In the paper, there is a chart on the precedence of filters which is very nice.
    Especially this shows by what precedence various filters work in succession. It is essential for any Viz developer to understand this precedence and use them accordingly.

    I found that the chart above ignored sets, which are also a class of filters by its own.

    Sets as Filter
    Sets are a special class of filters which narrow the scope of the data set. When you create a set and put it on the filter shelf, all the following calculations will take place in the context of the set. The filter conditions for the set can be configured in this interface:
    The definition of the filter can be based on dimension or measure or both. Only the data that meet the conditions will be included in the set.

    For a dimension, right click it, we can see a filter settings like this
    Note that the wildcard option can also create a set. But this is not included in the Create Set menu!

    So, there are 4 ways to create a set filter or these 4 filters are all set filters:
    - General: handpick a list
    - Wildcard
    - Conditional
    - Top

    These conditions can be all used in parallel to define a set!

    Revised Precedence of Filtering
    Here is the one I draw as the revised precedence of filtering.
    All filters except the last one reduce the number of rows/records or narrow the scope of the original data set. 

    The last one - Table Calc Filters - will only select the part of the calculation results that are of interest to the viewer, and hide the rest of the results. They will by no means remove a single bit of data.

    Context filters are preceding set filters and all the rest. Set filters further narrow the scope of data to those within the set. They act like context filter to the following dimension filters. 

    We have built an example to illustrate how set filters work with dimension filters and LOD.

    Sets vs Dimension Filter
    In the example, a set is created for top 10 customers in terms of total sales. The category filter can have multiple options. All the category options will be calculated within the top 10 customers. The set provides the context for the category filter here.

    Sets vs LOD
    LOD will work independent of sets. So sets have no effect on LOD calculations. The following example shows that with the set in filter shelf, we can still use LOD to calculate the minimum sales per customer per category over the entire customer base.
    The examples are included in a workbook which can be downloaded.

    In conclusion, we showed here where sets work in the precedence of filtering. They are after context filters and LOD, and before dimension filters.





    0

    Add a comment

  2. This is meant to be a general incremental extraction scheme. It is derived from an example of our own work with Salesforce data.

    We use Alteryx to extract data from Salesforce and then share the data source in yxdb/tde format. The initial extraction was simple and brute: extract data fully everyday. Now the data volume is huge and the extraction is taking hours. It consumes bandwidth and especially the extraction may be timed out because Salesforce server may issue an error of something like "the query is taking too long time".

    An incremental extraction becomes imperative. But it is not something straightforward to implement. With the help of Carlos Avila from Alteryx, I am able to put together a solution.

    It consists of the following steps:
    1.Extract the last timestamp (createdDate) from existing data file.
    2.Send the timestamp to a macro and the macro will extract data after the timestamp.
    3.Union the new data with the existing data.

    The module ends up like this:
    The existing data is in a .xydb file. The result is written to the same file after union. The Summarize tool extracts the last createdDate and feeds it to a filter to format the date. Then the formatted date is then fed to a macro. Note that it is a good practice to have a 'Block until Done' tool prior to write the results to the source file, because we are reading data from it.
    Note that we need to select the control parameter that goes into the macro.
    Then right click the macro and select open it for configurations.

    The macro is as follows where we need to configure two tools.

    1.Salesforce connector
    Besides the credentials, we need to add a SOQL query:
    • CreatedDate > LastCreatedDate
    2.Update value
    We only need to highlight the text string where a replacement will take place.

    That's it folks.

    This reduces the extraction from hours to minutes!
    0

    Add a comment

  3. [Update - a couple of related posts have been published recently:
    Counting Active Orders via Pivot
    Counting Active Customers ]

    Introduction
    Things come and go, including yours truly. C'est la vie. Life started when I checked into Earth on birthday. When will it end? Will see. In between, I am considered as an active person in the national census.

    We often need to count moving parts of a limited lifetime at a specific time. This is called stocktaking sometimes.

    Stocktaking has multiple equivalent expressions:
    - Churn analysis
    - Take stock
    - Take inventory
    - Make inventory
    - Cycle count
    - Inventory audit
    - Head count
    - Faire l'inventaire
    - Count concurrent users

    There maybe more and I don't know them all. It depends on the industry you work with. Stocktaking is referring to checking inventory at a specific location and at a specific time. It represents a pattern of recurring issues across various business operations.

    Stock or inventory can be anything that is dynamic with a life cycle (in then out):
    - goods in a warehouse
    - hospital patients
    - hotel guests
    - company headcounts
    - active customers
    - outstanding help desk tickets 

    There have been a large collection of similar problems:
    1.Utilization, Queues, Census, Throughput for Blocks, Floors, Units, Rooms, Offices, etc. - What do you do?
    2.FAQ: Open & Close Dates

    Stocktaking is important to customer lifetime estimation, capacity planning, cost and net profit calculations in any organization who needs to deal with a dynamic flow of various business entities.

    Here I am going to show a couple of solutions of visualizing such a problem.

    Example: Taking Stock of Outstanding Orders 
    The example data set will be our familiar Superstore's where we will count the "Outstanding Orders", that is, the orders that have yet to be shipped. Between Order Date and Ship Date, an order is considered to be Outstanding.

    Table Data Structure
    Note that in the Superstore data set, Order Date and Ship Date are in the same row of Order ID. This is referred to as Lookup table (in comparison to Transaction table where Order Date and Ship Date are in separate rows, one row per transaction).

    Each order (same ID) may contain multiple mini orders, one per Product Sub-Category. Each row in the data set represents one mini order. Each mini order has its own Ship Date.

    What do we count?
    So, to make it simple, we will count those outstanding mini-orders at any specific date.

    Two Approaches
    I would like to describe two approaches in solving the problem using Tableau. Both approaches are linked to Zen Master Joe Mako in the origin.

    Approach 1. Custom SQL
    The following simple SQL will help convert a lookup table to a transaction table. The day we got an order, it is +1 in outstanding orders. The day we shipped one order, it is -1 in the count. (Check out this article on the use of custom SQL over Excel files.)
    • Select [Order Date] as [Date], 1 as [Count] from [Orders$]
    • Union all
    • Select [Ship Date] as [Date], -1 as [Count] from [Orders$]
    Calculating the running total on [Count] will give us the Outstanding Orders along time. Note that at the end of ship date range when all orders are shipped, the number of outstanding orders becomes zero. This is a way to validate the correctness of the calculation.
    Approach 2. Scaffold
    This approach will use a simple scaffold to build a common time frame. It doesn't require SQL to convert the current table to a transactional one. So it doesn't need to reshape the data. Instead, we will need to duplicate the data source.

    Scaffold
    - The scaffold table consists of a single column named [Record] with two values: 1 and 2.
    - We need two parameters [Start Date] and [End Date]. [Start Date] must be earlier than the first [Order Date]. [End Date] must be later than the last [Ship Date].
    - Using both parameters, we can build a [Date] dimension that covers the entire date range of interest. See the formula in picture below. (Need to turn on the "Show Missing Values" under the [Date] dimension).

    Duplicate Data Source
    We need to duplicate the data source, one for [Order Date] and the other for [Ship Date]. Then we will blend both dates with the one in Scaffold table.

    - Rename both [Order Date] in one source and [Ship Date] in the other to [Date]. Have both blended with the one in Scaffold which is used as the primary data source.
    - Create 3 calculated fields under Scaffold.
    [Order Records]=ZN(SUM([Orders (Sample - Superstore Subset (Excel))].[Number of Records]))
    [Ship Records]=ZN(SUM([Orders (Sample - Superstore Subset (Excel)) (copy)].[Number of Records]))
    [Outstanding Orders]=[Order Records]-[Ship Records]

    ZN() is very important because there is a lag between [Order Date] and [Ship Date]. One may have Null values in the eyes of the other. The subtraction in [Outstanding Orders] will go wrong without ZN().
    - Set the table calculations of [Outstanding Orders] to Running Total. This gives the outstanding orders at any date.


















    The workbook including both approaches can be downloaded here. We are done with both approaches.

    Following is some discussion on related topics.

    Scaffold with Filters
    If you need to filter the chart in the scaffold option, you need to include a few more dimensions in your scaffold. See this post for the approach.
    Blending Data Via Multi-Dimensional Scaffolding

    Close Date vs Check Out Date
    A bit of nuance exists in the statement of different use cases:
    [Close Date]: The last day the item is in stock.
    [Check Out Date]: The first day the item is out of stock.

    [Ship Date] in our example is like [Check Out Date].

    Here [Check Out Date] is one day later than [Close Date]. For the calculation, we always need to know the first day when the item is no more in inventory. Given [Close Date] or equivalent, we need to do this:
    • [Date]=DateAdd('day',1,[Close Date])
    In the custom SQL approach, the code needs to be modified as:
    • Select [Open Date] as [Date], 1 as [Count] from [Orders$]
    • Union all
    • Select DateAdd('day',1,[Close Date]) as [Date], -1 as [Count] from [Orders$]
    Lookup Table vs Transaction Table
    Keith Helfrich presented some interesting thoughts on these two structures. Data may come in either way:
    - Lookup table
    Both Order Date and Ship Date are on the same row of Order ID.

    The discussion in this article assumes such a data structure. All the methods apply.

    - Transactional Table
    Order Date and Ship Date are recorded in different rows. That is, two rows per order. There is a single Date column and another column Type to denote Order or Ship.

    Then we only need to create a calculated field [Count]:
    • if [Type]='Order' then 1 else -1 end
    Set the table calculation of [Count] to Running Total. And you are done.
    5

    View comments

  4. So far I attended twice the San Francisco Bay Area Tableau User Group (SFBATUG) meetups. First time was at Twitter HQ in March. This time was at Charles Schwab on July 29, 2015. In between, seems I missed quite a few.

    Each time, it was fantastic. The events were well organized. The speakers have prepared and delivered great presentations. The venues are among the best in town with nice food and wine. I am so grateful.

    At the last meetup, Keith Helfrich gave a powerful speech recounting his personal journey. Michael Perillo showed his love to his wife by creating a viz. Quite a few memorable moments at this meetup.

    Here I created a viz about the last meetup including photos I took at the presentations. I figured out a way to embed slides in Tableau dashboard. There you go.

    Last but not the least, remember to register for the forthcoming meetup on Sept 23. The registration link is at the bottom of the viz.

    0

    Add a comment

  5. The following of my articles have been selected as the Best of Tableau Web recently:

    May 2015


    June 2015


    July 2015

    Very grateful to the Tableau community for the inspirations and encouragements!
    1

    View comments

Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.