38
Window Functions in SQL: Part 1
In SQL, Window functions are functions that performs operations across a set of rows that are related to the row the function is currently operating on.
Window functions were first introduced in SQL in 2003 with functionality expanded in 2012 and are needed in SQL because they simplify certain complex operations and analysis and can be used to calculate running totals, moving averages, and growth over time amongst others.
The dataset below with a set of rows and columns has a window function operating on a particular column and the result extracted spans an entire new column as shown below.
From the diagram above, we can see a sample dataset with 2 columns (column_1 & column_2). Using the LAG function (which shall be explained later), we can see the result that is produced which spans an entire new column(lag). To get this, all that was needed using window functions was two lines of code as seen below
To start with, we shall look at the basic window functions, which include:
The Ranking functions:- Row number, Rank & Dense rank
The Fetching functions:- Lag, Lead, First_Value & Last_Value.
I created a dataset of Movie downloads which contains certain movie names, genre and number of downloads. This can be created using this SQL statement below.
Ranking functions are functions that assign numbers to rows in sequential order. To rank a column in a dataset, things like highest and lowest can be easily seen with a glance and it can be used as a reference (index) for other operations in SQL.
The different ranking functions have the same result with very few differences:-
ROW_NUMBER ranks the different rows starting from number 1. It is used mainly as an index for a dataset and can be used for easier reference to each row.
RANK also does the same as ROW_NUMBER above but assigns the same number(s) to identical values and skips the next value(s) for the number of times the number was repeated.
DENSE_RANK also assigns the same number(s) to identical values but doesn’t skip the next value(s) at all.
The movies dataset can be ranked as is with the 1st movie recorded as the 1st rank and using the ORDER BY function, it can be ranked from the lowest to the highest number of downloads (as shown below) or vice versa:
The OVER clause in lines 1,2 & 3 is a staple in all window functions and determines exactly how the rows of the query are split up for processing by the window function.
The ORDER BY clause in lines 2 & 3 is used inside the window functions' OVER clause specifying that the ORDER should be determined before the function is executed.
Looking at the output above, we can see the differences between the ROW_NUMBER(), the RANK() and the DENSE_RANK().
Let's take a look at the PARTITION BY clause below:
PARTITION BY
The Fetching functions work a bit differently from the ranking functions:
LAG returns the value at n rows before the current row.
LEAD returns the value at n rows after the current row.
LAG(downloads, 1)
LEAD(downloads, 2)
The last two window functions in this article are pretty straight forward.
The FIRST_VALUE returns the value of the first row in a table or partition and LAST_VALUE returns the value of the last row in a table or partition.
RANGE BETWEEN
RANGE BETWEEN
38