Cross Join is quite useful in data manipulation in creating Sigmoid/Spline based charts (Sankey for example), data scaffolding and many other applications.
Some may wonder why Tableau doesn't offer Cross Join within its join interface. I guess that it's because there are so many ways we can do it already. Here let me summarize them here.
1.Via Custom SQL with Union
Assume we want to cross join two tables: A and B. This works when one of the 2 tables is smaller with just a few rows (and a few columns). Assume here B has only 2 rows and 1 column:
Let's open Table A (Orders table in Superstore data set) in Excel file via Legacy Connection. This gives us the option to write Custom SQL.
This creates the cross join between the two tables! This is an application in creating Sigmoid curve.
If B has more rows, we just need to add more such unions. If B has more columns, we need to explicitly spell them out for each and every element of the columns in the query.
As we mentioned before, this approach is good if B is small.
2.Via Custom SQL with Excel
Once I needed to create a table structure via cross join for scaffolding. Then I put the tables into separate sheets in one Excel file. Opening it via legacy connection and using Custom SQL, I am able to cross join them easily.
Select * from [Date$],[Product Category$],[Customer Segment$]
This approach is fairly universal and only needs a single line of SQL code.
3.Via Tableau's Native Inner Join Dialog
Zen Master Jeffrey Shaffer described this cross join approach early on. Kettan wrote a great tutorial by creating an extra identical Join Key column in each joining table. The Join Key column can be populated with the same number such as 1 or the same string such as 'Join'.
This approach doesn't need SQL nor pre-populated Join Key column. No need to reshape data. This can work across different data sources. It's the most versatile approach.
In Tableau 10.2+, we can use "Join Calculation" to create Join Keys in both joining tables. Here we created '1' column in both tables. Note that only the first Join Key can be 1, while the other has to be generated using a formula. The dialog just doesn't let me to enter 1 in both Join Calculations. No need of pre-populated Join Key any more.
I was told by Zen Master Chris Love that Bethany Lyons of Tableau demoed something similar at Tableau Conference 2016. After viewing the video, I found that she used Join Calculation in one table, and pre-populated the other table with a Join Key column. It works as well.
Voila, a little review of the four approaches for Cross Join in Tableau. Pick the one that's appropriate for you. Let me know if you have different approaches.
Hi Alex, great post! FYI the need for using a formula doesn't exist anymore, there was a problem during the 10.2 beta that prevented a join on two join calculations with both being 1 that was fixed for the initial 10.2.0 release.
ReplyDeleteThanks for the update! Now in the 4th approach, we can enter 1 in Joint Calculations for both data sources. This makes it even simpler to create cross join.
Delete