Introduction
Jump Plot is used in visualizing process measurement.
In
my research regarding Bézier curve, I finally got a chance to have a closer look at Chris DeMartini's masterful vizzes. By taking apart
his jump plots, I found something strange. I couldn't understand the data structure initially.
I realized eventually that he transformed the data set in order to use piece-wise Bézier curves so that he can create jump plots. This almost doubled the size of the data set.
By digging a bit more into the data, I found that we can do with regular data structure, saving the effort of data transformation and saving extra rows.
The following is using the
Jump Plot Overview viz as a reference, in particular, "Step 8: Threshold Based w Percentage".
Data structure
All we need are three columns: Series ID, Sequence #, Event Date. We assume that the dates are forwarding along with the Sequence # or check point number.
Assumption
We need to make sure that the event dates are in accordance with the assumption. The data in the original Jump Plots viz seem not to have forwarding sequential dates in every series. The dates are actually supposed to be forwarding. We confirmed it with Chris DeMartini the author and Tableau Zen Master. We did some data ordering to fix that.
The steps for creating the jump plot as follows. Only the most important calculations are shown. More details can be found in the workbook.
1. Indexes
First we will set up BézierValue to create the corresponding bins, which will serve as the base for data densification. In our example, we will densify the data to 601 points between 6 check points
BézierValue is derived from Sequence #s or check points. Between two contiguous check points, we will add 100 data points.
We will create [t] which is a required variable for creating Bézier curve. [t] must be a number between 0 and 1. It is an actual index for the points in Bézier curve.
2. X Calc and Non-contiguous sequences
Usually we have to go through the check points in sequence. Thus the sequence numbers in a series are contiguous.
Sometimes, we only have a sequence (Series ID=2) like 1,2,4,5,6 where 3 is missing. Then we have to link 2 and 4 with a piece of Bézier curve. The calculation is a bit different. Between the 2 points, the distance is no longer 1 but 2. So for each data mark, we have to calculate X_P_Max and X_P_Min, just in case they are not contiguous.
3. Y Calc and Date differences
The Y value is proportional to the date difference between two consecutive check points. With some table calculations, we can get the dates of the two consecutive check points. Our calculation doesn't require the data set transformation. Since we make sure the dates are forwarding, we don't need to take ABS() of the date differences.
Also added a [CheckPoint] field for the check points in dual axis.
4. Bézier Curves
Between two consecutive sequence points or check points, there is a piece of Bézier curve. Given 6 sequence points, we will have at maximum 5 pieces of Bézier curve. Note that if X_P_Max-X_P_Min = 1, the formula can be further simplified.
Conclusion
Instead of performing extra data transformation on data set, we directly use the available data. The necessary values for Bézier curve are created using table calculations. This saves us the trouble of extra data preparation. This can make the creation of Jump Plot a bit easier.
Click
here to go to the interactive version.
Add a comment