Monday, January 3, 2022

Pre-Filters, Post-Filters and the Order of Operations in SQL

In data analysis, we need to use filters here and there. In general, we would classify them as pre-filters or post-filters for better understanding of their respective mechanisms.

A pre-filter works on the data set. It only keeps the data we need for the analysis.
A post-filter works on the results. It only keeps the results we need.
In between, calculations are performed on the data, such as aggregations, ranking etc.

In SQL, 
WHERE statement defines what data to keep. 
HAVING statement defines what results to keep.

Thus WHERE is a pre-filter and HAVING is a post filter.

WHERE the pre-filter works on the dataset and defines what rows to keep with conditions like STATE='California' and CITY='San Francisco'.

HAVING the post-filter works on the results after various aggregations or calculations of data. For example, HAVING Rank>=5 to get top 5 results only. The rest of the ranking results are disgarded.

It is an important concept in Tableau that some filters are pre-filters (ex. dimension filters) and some post-filters (ex. table calc filters). I found it interesting to apply the same concept on SQL operations as well. Here is a tweet I posted a year ago:
Pre and post filters are part of SQL operations. Here is the order of SQL operations:

Feel free to leave comments or contact me at twitter @aleksoft



2 comments:

  1. My understanding is that UNION happens after SELECT and FROM happens first, at the same time as JOIN but I've yet to see how or why this matters. Just using this as reference: https://vladmihalcea.com/sql-operation-order/

    ReplyDelete
    Replies
    1. Very good question! Union can be used to combine source data sets or result data sets.

      When multiple source data sets are involved, we do need a Select * From to get the data first. In this sense, we have to put Union after Select * From.

      Sometimes, Join/Union can be done in other tools like Excel or Python before being processed by SQL. In a pure SQL environment, it's true that Union will always be after Select statement.

      Delete