A pre-filter works on the data set. It only keeps the data we need for the analysis.
A post-filter works on the results. It only keeps the results we need.
In between, calculations are performed on the data, such as aggregations, ranking etc.
In SQL,
WHERE statement defines what data to keep.
HAVING statement defines what results to keep.
Thus WHERE is a pre-filter and HAVING is a post filter.
WHERE the pre-filter works on the dataset and defines what rows to keep with conditions like STATE='California' and CITY='San Francisco'.
HAVING the post-filter works on the results after various aggregations or calculations of data. For example, HAVING Rank>=5 to get top 5 results only. The rest of the ranking results are disgarded.
It is an important concept in Tableau that some filters are pre-filters (ex. dimension filters) and some post-filters (ex. table calc filters). I found it interesting to apply the same concept on SQL operations as well. Here is a tweet I posted a year ago:
Wow!!Very useful and Super easy to follow.!!
— Rajeev Pandey (@rajvivan) January 29, 2021
Pre and post filters are part of SQL operations. Here is the order of SQL operations:
BTW here is the order of operations in Tableau.
My understanding is that UNION happens after SELECT and FROM happens first, at the same time as JOIN but I've yet to see how or why this matters. Just using this as reference: https://vladmihalcea.com/sql-operation-order/
ReplyDeleteVery good question! Union can be used to combine source data sets or result data sets.
DeleteWhen multiple source data sets are involved, we do need a Select * From to get the data first. In this sense, we have to put Union after Select * From.
Sometimes, Join/Union can be done in other tools like Excel or Python before being processed by SQL. In a pure SQL environment, it's true that Union will always be after Select statement.