1. In ma previous post, Creating Jump Plot, we used Bézier function for drawing the curves. But Bézier curve is not the sole option.

    Today, we would like to show you how to use the familiar Sine function and Triangular function for drawing those curves. I got the inspiration from a viz of Sebastián Soto Vera. Check it out, it is a great use case of jump plot.

    We will use the same data set as in my last post. To make it simple, we will only focus on contiguous sequences, which go through all the check points. This will make the calculations a lot simpler.

    Simplified Steps
    -Densification sill the same based on the Sequence
    -[t simple], of values between 0 and 1, is an index between 2 check points
    • [t simple] = (Index()-1)%100/100
    -X coordinates will be Index()
    -Y is the measure, which has to be available at each densified points.

    Now all the variety of jump plot differentiate by the way we calculate Y:

    Sine Jump Plot
    Circular/Elliptic Jump Plot
    Triangular Jump Plot
    Bézier Jump Plot
    Here are the resulting charts:
    You see, there can be more other functions for creating a jump plot. You may replace the above by your favorite sexy curves. Click the above image to access the viz.
    0

    Add a comment

  2. Introduction
    Jump Plot is used in visualizing process measurement.

    In my research regarding Bézier curve, I finally got a chance to have a closer look at Chris DeMartini's masterful vizzes. By taking apart his jump plots, I found something strange. I couldn't understand the data structure initially.

    I realized eventually that he transformed the data set in order to use piece-wise Bézier curves so that he can create jump plots. This almost doubled the size of the data set.

    By digging a bit more into the data, I found that we can do with regular data structure, saving the effort of data transformation and saving extra rows.

    The following is using the Jump Plot Overview viz as a reference, in particular, "Step 8: Threshold Based w Percentage".

    Data structure
    All we need are three columns: Series ID, Sequence #, Event Date. We assume that the dates are forwarding along with the Sequence # or check point number.

    Assumption
    We need to make sure that the event dates are in accordance with the assumption. The data in the original Jump Plots viz seem not to have forwarding sequential dates in every series. The dates are actually supposed to be forwarding. We confirmed it with Chris DeMartini the author and Tableau Zen Master. We did some data ordering to fix that.

    The steps for creating the jump plot as follows. Only the most important calculations are shown. More details can be found in the workbook.

    1. Indexes
    First we will set up BézierValue to create the corresponding bins, which will serve as the base for data densification. In our example, we will densify the data to 601 points between 6 check points

    BézierValue is derived from Sequence #s or check points. Between two contiguous check points, we will add 100 data points.
    We will create [t]  which is a required variable for creating Bézier curve. [t] must be a number between 0 and 1. It is an actual index for the points in Bézier curve.
    2. X Calc and Non-contiguous sequences
    Usually we have to go through the check points in sequence. Thus the sequence numbers in a series are contiguous.
    Sometimes, we only have a sequence (Series ID=2) like 1,2,4,5,6 where 3 is missing. Then we have to link 2 and 4 with a piece of Bézier curve. The calculation is a bit different. Between the 2 points, the distance is no longer 1 but 2. So for each data mark, we have to calculate X_P_Max and X_P_Min, just in case they are not contiguous.
    3. Y Calc and Date differences
    The Y value is proportional to the date difference between two consecutive check points. With some table calculations, we can get the dates of the two consecutive check points. Our calculation doesn't require the data set transformation. Since we make sure the dates are forwarding, we don't need to take ABS() of the date differences.
    Also added a [CheckPoint] field for the check points in dual axis.
    4. Bézier Curves
    Between two consecutive sequence points or check points, there is a piece of Bézier curve.  Given 6 sequence points, we will have at maximum 5 pieces of Bézier curve. Note that if X_P_Max-X_P_Min = 1, the formula can be further simplified.
    Conclusion
    Instead of performing extra data transformation on data set, we directly use the available data. The necessary values for Bézier curve are created using table calculations. This saves us the trouble of extra data preparation. This can make the creation of Jump Plot a bit easier.

    Click here to go to the interactive version.
    0

    Add a comment

  3. The other day, Rajeev Pandey asked me a question regarding the calculation in his dashboard privately. I didn't have time to look into it. Fortunately the gracious Simon Runc found that the issue was in data densification.

    Rajeev is trying to replicate a viz by Zen Master Rody Zachovich, which seems to be inspired by a viz of Cody Crouch.

    I spent some time looking into both the vizzies. I found a few computational solutions which may be interesting to share.

    1.Data set reduced to 2 rows
    Only two initial rows per curve are needed in Bézier curve calculation, even when the curve is a concatenation of multiple piece-wise Bézier curves.

    In Rody Zachovick's NFL viz, we found that only Path={0,2} are necessary per curve instead of 4 rows. Rody used two pieces of Bézier curve and hid part of it.
    In Cody Crouch's Golf viz, we found that only Path={0,3} are necessary per curve instead of 4 rows. It consists of 3 pieces of Bézier curve, although the last piece is actually a straight line.
    All the rest of data marks can be generated by data densification from the two rows.

    2.One Bézier curve instead of two
    Rody made use of two pieces of Bézier curve in his viz. I found that, given it's a single concave curve, we need only one Bézier curve to approximate it. So, I made changes to the formula and the resulting viz looks like the same. The total number of data marks is cut by half. The curve still looks very smooth. The formula is a bit simpler too.

    By the same token, Cody's viz can be designed with one Bézier curve plus a straight line. I will leave it to whoever wants to give it a try.

    3.Simplify calculation
    Bézier gave his famous quadratic equations as follows:
    This is the general form. In special cases, these function can be simplified.

    For example, when Y0=Y2=0, we have Y's calculation simplified as follows:
    Then we replaced part of the calculation in X by Y. Note that 0.6 is a control coefficient on the horizontal position of the highest point.
    That's the tweak of the day. The purpose here is to simplify the application of Bézier curve.

    Have fun with Tableau! Feel free to view and download the modified viz.
    0

    Add a comment

  4. How to chart user growth? Sounds like a simple question. Yet I got asked a number of times on the topic. It may not be as straightforward as one might have thought. Seems no one is documenting it. So I decided to write it down.

    What is user growth?
    - it is not the count of active users per month/week/day and the accumulation thereof.
    - it is the accumulation of unique users along time.

    It takes 3 steps to do it. We will use the Superstore as the data set for illustration in charting the growth of customers who ordered products from the superstore.

    1.Find the first time when the user activity is recorded.
    • [First Order Date]={fixed [Customer Name]: min([Order Date])}
    We will use [First Order Date] as the date axis.

    2.Count the first time users
    • CountD(Customer Name)
    3.Calculate the running total of the above count of customers.

    Right click the CountD(Customer Name) pill and select the Running Total in Quick Table Calculations.

    Voila it's done. Click the image below to access the viz. Included also is a step chart version. Zen Master Rody Zachovich used it in a recent viz. Tableau will integrate it in the coming version 2018.1 already.

    0

    Add a comment

  5. Ups and downs are the simplest indicators of trends. They are so simple that they may entice people to look further into the data. Just want to emphasize they are simple but important to data visualization.

    The one and only Rody Zachovich created a great dashboard on World Economical Freedom Indicator in the MakeoverMonday Week9 project. As always, his dashboard is crisp and full of creativity.
    Usually a #MakeoverMonday project is supposed to be done in a couple of hours. The dashboard is outstanding given the time frame.

    I see that if we color the trend lines to accentuate the ups and downs, it may look even better. Below is the result. See if you agree that adding colors makes it look better
    That's the tweak of the day. Click images to go to the interactive version.
    0

    Add a comment


  6. Chris Mc on Twitter is kind of amazed that a rather sophisticated graph like Julia Set can be generated using only 8 rows of data.
    I am going to show that we can reduce that to 2 rows.

    In visualizing math functions, usually we can use a fairly small data set as seed, such as 2 rows for one dimensional graphs or 8 rows for 3 dimensional graphs. The rest of data can be derived via data densification, a special tool in Tableau.

    Basically the data densification allows us to create an indexed grid along each of the dimensions, on which the math functions will be drawn.

    Usually we need 2^N rows of data as seed to create a N-dimensional data grid. For example, for a 3-d grid, we would start with a seed table of 8 rows like:
    Just figured out a way to do it using 2 rows. Here it goes the steps:

    - Create 2 rows in a single columns: (the other columns are all calculated fields.)
    • Seed
    • 1
    • 2
    - Create any dimensions as follows. Say there are 3 dimensions x_basey_basez_base. Each of them can be created using the following formula: (This will define the range of each dimension.)
    • Case [Seed]
    • When 1 then 1
    • When 2 then [points]
    • End
    This defines the range of dimension x_base from 1 to a dynamic parameter [points]. This is from an example on Julia Set visualization.

    - Create bins for each of x_base, y_base, z_base with step size 1.

    - Drag x_base(bin)y_base(bin), z_base(bin), to the Details shelf. These bins are the bases for data densification.

    - Create Index() and drag it to the Details shelf. Set it to compute along all dimensions: x_base(bin)y_base(bin),   z_base(bin). This will trigger the data densification in all 3 dimensions. This is the most consistent way to do it. It took a while for me to figure this out because it can be tricky to trigger the densification.

    Voila the above are the essential steps for creating a 3-dimensional grid through data densification. The size of each dimension can be fixed or dynamic.

    See examples in creating the Julia set and Mandelbrot set.

    Conclusion
    For visualizing mathematical functions, all you need is 2 rows of data as seed. You can derive the rest. A great benefit of this approach is being able to use parameters to explore various combinations, because we can create data sets on the fly.
    0

    Add a comment



  7. The fractals are always fascinating. I am totally mesmerized by it. Inspired by Zen Master Noah Salvaterra's work, here I created the Julia set using Tableau alone for both generating all the data marks and visualizing them, without using external tools like Python or R.

    Julia set is the original fractals that inspired Mandelbrot to found his own set. Mandelbrot set is a subset of Julia set. Leo Newman actually created a few Julia set vizzies by creating the data set first, then visualized it using Tableau.

    The problem with extra tools is that one needs to know extra languages and to integrate extra tools. Tableau is designed for non professional programmers. It may not be realistic to expect a Tableau user to know some serious programming languages. However, it all depends on personal preferences and need.

    Usually we can use extra tools to generate data set. Then visualize it in Tableau. This may decrease flexibility. Actually, we can do everything in Tableau natively: for both data set creation and visualization.  In this way, we may need to push Tableau's capability to the extreme and so be it. It's like running LINPACK to test a new computer and to access its performance.

    Here I use Tableau alone to create data set and visualization. Using similar optimization techniques as before. But it is still quite slow to my taste.

    Julia set provides us with a great variety of fractals which are wildly beautiful. You are welcome to download it, play with it and leave questions in the comment area. More initial parameters can be found in Wikipedia's Julia set entry for creating various Julia set images.
    Conclusion and issues
    While Tableau provides me with a wonderful graphical tool and I got great personal satisfaction in exploring the beauty of mathematics, I feel that fractals and math function rendering can be used to push the boundary of computation engine in Tableau: data structure, memory allocation, compiler, rendering. Maybe there is something to rethink of, because there seems much room for improvement in the simple computing of fractals.

    The higher the number of points (in horizontal or vertical dimensions), and the higher the number of iterations, the better resolution of the image. But it will be really slow if the total number of pixels is greater than a few millions.

    Most computation above uses a long string to store a state vector (or vector state?). It includes the x and y coordinates of each data mark. The manipulation of this long string seems to cost a lot of computation time.

    Does introduction of local variables (registers) inside the formula editor help? Just a thought.

    Using Tableau alone, one can already create a variety of fractal images. It maybe a little slow, but it's simple and a lot of fun. You can download the workbook here and find the initial c parameters in the above wiki link or over the web.

    Note that try to start with just a few iterations and a small number of points. Otherwise it may take a long time. Also, disable the auto-update in worksheet or dashboard menu while you set up the parameters. Have fun!
    1

    View comments

Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.