I was asked a question: How to find out the IDs that showed up consecutively 5 times during the last 14 days?
How would you solve it?
Here I came up with 2 solutions. The 2nd one is a little simpler. Note that in either solution, we need to Show Missing Dates to make the calculations be based on contiguous dates. BTW, the last 14 days is irrelevant to the solution.
1. Solution with Running Sum
The idea is, CountD per Day for each ID equals 1 if it shows up, 0 if it doesn't. Then we calculate Running_Sum(CountD(ID)).
If Running_Sum - Lookup(Running_Sum,-5) = 5, we have 5 consecutive days for the ID!
2. Solution with Moving Window Sum
Here we create a 5-day moving window sum, using a lesser known window function feature:
Window_Sum(CountD(ID), -4,0)
If the 5-day Window_Sum=5, we have 5 consecutive days!
The last two arguments in the above formula are Start and End positions of the window. We can define them relative to the current position 0. We can also use First() and Last() to define the positions.
Thus we have following equations:
Window_Sum(CountD(ID), First(), Last()) = Window_Sum(CountD(ID))
Window_Sum(CountD(ID), First(),0) = Running_Sum(CountD(ID))
Window_Sum(CountD(ID), 0, Last()) = Reverse Running_Sum(CountD(ID))
A companion workbook is included for download.
Feel free to leave comments below or contact me at twitter @aleksoft .
Both these solutions use table calcs, which works well when you are already fetching the values that you need to traverse to the Tableau client. If you have a very large data set and you want these calculations to happen on the server side instead, then another option is to use a SQL windowing query (sometimes called an analytic query).
ReplyDeleteThen you can either access those results with a bit of custom SQL as one part of your Tableau data model (*not* one giant SQL statement to replace your Tableau data model) or make a view in your database that provides the new info.
SQL analytic queries are very useful but under-appreciated technique. In many ways, Tableau's table calcs are simply a client-side feature that mirrors what analytic queries can do on the server-side.
BTW, Hyper also implements analytic queries - you can see them from the Hyper API