Introduction to Window Functions

Introduction to Window Functions

Window functions are specialized SQL functions used for performing calculations across a set of rows related to the current row. Unlike standard aggregate functions, window functions maintain individual row integrity, rather than collapsing the data into a single output row. They are particularly useful for running totals, moving averages, and cumulative statistics.

Before continuing reading the article, we recommend watching this YouTube video:

 

 

Key Concepts in Window Functions

  • Window: The set of rows a window function operates on.
  • Partitioning: Segmenting the data into smaller groups for independent processing by the window function.
  • Ordering: Establishing the sequence for processing the rows within a window.
  • Frame Specification: Outlining a specific subset of rows within a partition for the function's consideration (e.g., rows between the current row and the next 10 rows).
Basic Syntax of Window Functions

The general structure for a window function is as follows:

SELECT
  [column_name],
  [window_function]([column_name]) OVER (
    [PARTITION BY [partition_column]]
    [ORDER BY [ordering_column]]
    [frame_clause]
  )
FROM   [table_name];

 

Practical Examples:
Example #1. Running Total of Sales

Task: Compute a running total of sales for each order date.

-- CTE for calculating daily sales sum
WITH   OrderDateSum AS(   SELECT     Order_Date, -- Date of the order     SUM(Sales) AS Sales, -- Sum of sales for each day   FROM     `dataacademykz.superstore.sales_analytics` -- Source table   GROUP BY     Order_Date -- Grouping by order date   ORDER BY     Order_Date -- Ordering by order date   )
SELECT   Order_Date, -- Date of the order   Sales, -- Sales amount for each day   SUM(Sales) OVER (ORDER BY Order_Date) AS Running_Total_Sales -- Calculating running total of sales FROM   OrderDateSum -- Referencing the CTE

How did we do that? You are already familiar with sub-queries from the previous lesson, right? We created a simple temporary table called OrderDateSum, where we calculated the sum of sales for each order date.

Then we query this temporary to table to get the final result – running total of sales using window function SUM(Sales) OVER(ORDER BY Order_Date). This is a window function that computes the sum of all sales up to the current row, in the order specified by Order_Date.

  • The OVER clause specifies the window over which the SUM function is applied. In this case, it orders the rows by Order_Date without partitioning, meaning the running total is calculated over the entire result set

Our result looks like this:

 

Example #2. Ranking Products within Categories

Task: Assign a sales-based rank to each product within its category.

-- Selecting sales data and ranking products within each category by sales
SELECT   Category, -- Category of the product   Product_Name, -- Name of the product   Sales, -- Sales amount   RANK() OVER (PARTITION BY Category ORDER BY Sales DESC) AS Sales_Rank, -- Ranking of sales within each category FROM   dataacademykz.superstore.sales_analytics -- Source table

Explanation: This query assigns a rank to each product within its category based on its sales. 

RANK() Window Function

RANK(): This function assigns a unique rank to each row within a partition of a result set. The rank of a row is one plus the number of ranks that come before the row. If two or more rows tie for a rank, they receive the same rank, with the next rank(s) skipped. So, if two products have the highest sales and tie for rank 1, the next product will have a rank of 3.

OVER Clause

PARTITION BY Category: This directive tells the RANK() function to reset the ranking when the category changes. In other words, each category will have its own set of ranks, starting from 1. This is similar to grouping but without collapsing the rows into a single output row for each group.

ORDER BY Sales DESC: Within each category, this orders the products by their sales in descending order (from highest to lowest). The RANK() function uses this order to assign ranks to each product.

If you run the query correctly, you will get the result like this one:

 

Example #3. Moving Average of Sales

Task: Determine a 3-day moving average of sales for each product.

-- Selecting sales data along with a moving average calculation for each product
SELECT   Product_Name, -- Name of the product   Order_Date, -- Date of the order   Sales, -- Sales amount   AVG(Sales) OVER (     PARTITION BY Product_Name  -- Partitioning by Product Name     ORDER BY Order_Date  -- Ordering by date     ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS Moving_Avg_Sales, -- Moving average of sales for each product FROM   dataacademykz.superstore.sales_analytics -- Source table

Let us explain what is happening in our window function:

AVG(Sales):

  • This is an aggregate function that computes the average value of a set of numbers. In this case, it's being used to calculate the average sales amount.

OVER Clause:

  • This defines a window over which the AVG function will operate. Instead of calculating the average over the entire dataset or a group of rows, it calculates the average over a specified range of rows related to the current row.

PARTITION BY Product_Name:

  • This partitions the data into subsets, one for each product. The window function will restart its calculations for each product.

ORDER BY Order_Date:

This orders the rows within each partition by the order date. The order of rows is important because it determines which rows are considered preceding and following.

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING:

  • This is a frame specification that defines the set of rows to include in the window for each row's calculation. For each row, the window will consist of:
    • The current row.
    • The row immediately preceding it (1 row back).
    • The row immediately following it (1 row forward).

For example, if you look at Row #4, the moving average of sales calculates this way:

(Current row (25) + Preceding row (12) + Following row (5)) / 3 = 14.0