How to count current (or active) customers who ordered and to whom we delivered goods within last 4 weeks? A corollary question: what are the count of active customers at a past date?
This belongs to a general class of similar problems, which are all about counting.
- The number of active website users who logged in within one month
- The number of active sessions within 15 minutes
- The number of active customers who shopped at our outlets in the past 6 months
The list can go on. They all share the same characteristics:
- A new active period may start before another ends.
- The "Active" period has a fixed length following a starting date/time. After that, the status expires.
Let's see how the number of active customers in the beginning example can be counted and visualized over time using Tableau.
The data set consists of Customer IDs and a list of Delivery Dates for each customer. Those are the only two columns we need.
The main steps for counting the number of active customers are:
1. Create a calculated field [Expiration Date] by adding 28 days (4 weeks) to the [Delivery Date]
2. Pivot the above two columns [Delivery Date] and [Expiration Date] to become two new columns: [Date] and [Date Type]. See Counting Active Orders for details of doing this. (Go to the bottom of this post if you want Custom SQL for automation.)
3. Create a calculated field [Value]
4. Create a cumulative sum with a smoothing function [Running Value].
When two delivery dates have less than 28 days in between, the cumulative sum may become 2. But we only have one active customer. The smoothing function makes sure that there is only one active customer.
This is done at per customer level. We can visualize the periods where each customer is active. The following chart shows the contrast between RUNNING_SUM(SUM([Value])) and [Running Value]. The latter (Orange line) will always be either 1 or 0.
5. To count the total number of the active customers, we need to use a Window_Sum() to sum them up along the [Customer ID] dimension.
As usual, the Window function creates many rows of the same number. We just need one row. That is why we use First()=0 to keep only one. Note that we need to turn on "Showing missing values" in Day(Date) to show all the dates and make the pill to be discrete. Click image to download the workbook.
Regarding Data Preparation
Given a fixed period of being active, we can derive the expiration date from the delivery date. So, we have three basic columns with both dates in the same record. Then we need to pivot the table so that we have one row with the delivery date and another with the expiration date.
Unfortunately, the pivot function in Tableau's data source editor doesn't apply to calculated fields. To solve this, we can export the three-column table to a CSV file and re-import it to Tableau. Then we can apply the pivot to it.
The above technique may not lend itself to automation in some cases. To fully automate it, we have to use Custom SQL as described below:
The above SQL does both the calculation of the Expiration date and the pivot. Add more fields to the Select clause if you want to slice and dice the data afterwards. The SQL is for Microsoft's Jet SQL engine. The code may be a little different for different databases.
Thank you for this wonderful post
ReplyDeleteHi Alexander,
ReplyDeleteFrom Tableau online help, I can read that you can't pivot a calculated field... how do you manage to use expiration date in the pivot ?
Damien, you are right. Seems we can't pivot the column in Tableau when it's a calculated field. In this case, create the Expiration Date column in Excel before importing into Tableau. Then you can pivot the column.
DeleteHi Alexander,
ReplyDeleteIs there a way to retain the running count of active customers if I filter out according to a particular yy/month (If I have a "large" dataset, dating back to almost ten years), how can I still see active customers 5 years ago without losing the running count when filtering.
Is there any update on this ?
DeleteTry to use lookup() function to view the past period while keeping the running count.
Delete