SQL for Data Analysis in Google BigQuery: 23. Embracing Procedural Language for Advanced SQL Operations

Embracing Procedural Language in Google BigQuery for Advanced SQL Operations

Advanced SQL

Introduction

Procedural Language in Google BigQuery extends the capabilities of standard SQL by allowing more complex operations and logic. This article delves into the realm of procedural SQL in BigQuery, demonstrating how it can enhance data processing tasks and solve problems that might be cumbersome with standard SQL.

What is Procedural Language in BigQuery?

Procedural Language in BigQuery refers to the use of scripting within SQL queries. It allows you to perform operations that are iterative, conditional, and complex in nature, much like a traditional programming language.

Key Features and Benefits

Variables: Declare and use variables to store temporary values.
Scripting Support: Write SQL scripts that can execute multiple statements in sequence.
Control Flow Constructs: Utilize IF-THEN-ELSE logic, WHILE loops, and FOR loops.
Exception Handling: Implement TRY-CATCH blocks for error handling.

Using Procedural Language in BigQuery

1. Variables

Scenario: Suppose you want to analyze sales data for a specific date range, but the date range is dynamic and changes frequently.
Solution: Use scripting to declare variables for the start and end dates, and then use these variables in your query. This approach makes it easy to change the date range without modifying the entire query.

        DECLARE start_date DATE DEFAULT '2021-01-01';
        DECLARE end_date DATE DEFAULT '2021-12-31';
        
        SELECT *
        FROM project.dataset.sales
        WHERE sale_date BETWEEN start_date AND end_date;

In this script, start_date and end_date are variables holding the date range. The SELECT query then uses these variables to filter the sales data. This script makes the query more flexible and easier to maintain.

2. Using Scripting for Sequential Execution

Scenario: Suppose you have a requirement to perform a series of data preparation steps on a sales dataset.

Solution: First, you need to create a temporary table to hold filtered data, then update some records based on certain conditions, and finally, aggregate this data into a summary table.

        -- Start of the script
        BEGIN

            -- Step 1: Create a temporary table to hold filtered data
            CREATE TEMP TABLE TempSales AS
            SELECT *
            FROM project.dataset.sales
            WHERE sale_date >= '2021-01-01' AND sale_date <= '2021-12-31';


            -- Step 2: Update records in the temporary table
            UPDATE TempSales
            SET amount = amount * 1.1
            WHERE category = 'Electronics';

            -- Step 3: Aggregate data into a summary table
            CREATE OR REPLACE TABLE project.dataset.sales_summary AS
            SELECT category, SUM(amount) AS total_amount, COUNT(*) AS total_transactions
            FROM TempSales
            GROUP BY category;
        END;
        -- End of the script

This script exemplifies how BigQuery's scripting allows for executing multiple SQL statements in a sequential and organized manner, enabling complex data processing workflows within a single, coherent script.

3. Control Flow Constructs:

Scenario: You are managing a database of orders, and you need to apply discounts to orders based on certain criteria. For each order:

If the order amount is greater than $500, apply a 10% discount.
For all orders, add a loyalty point for every $100 spent.
Stop the process once you've processed 100 orders (to simulate a batch process limit).

Solution:

        DECLARE order_id INT64;
        DECLARE order_amount FLOAT64;
        DECLARE discounted_amount FLOAT64;
        DECLARE loyalty_points INT64;
        DECLARE counter INT64 DEFAULT 0;

        -- Assuming 'orders' table has columns: id, amount, discounted_amount, loyalty_point
        FOR order_id, order_amount IN (SELECT id, amount FROM project.dataset.orders LIMIT 100
        DO
            SET discounted_amount = order_amount;
            SET loyalty_points = CAST(order_amount / 100 AS INT64);

            -- Apply discount if order amount is greater than $50
            IF order_amount > 500 THEN
                SET discounted_amount = order_amount * 0.9; -- Apply 10% discoun
            END IF;

            -- Update the order with the discounted amount and loyalty point
            UPDATE project.dataset.orders
            SET discounted_amount = discounted_amount, loyalty_points = loyalty_point
            WHERE id = order_id;

            -- Increment the counter
            SET counter = counter + 1;
            -- Exit the loop if 100 orders have been processe
            IF counter >= 100 THEN
                LEAVE;
            END IF;
        END FOR;

Explanation:

Initialization: Declares variables for order ID, amount, discounted amount, loyalty points, and a counter.
FOR Loop: Iterates through each order, limited to 100 orders for batch processing.
IF-THEN-ELSE Logic: Applies a 10% discount to orders over $500.
Loyalty Points Calculation: Calculates loyalty points based on the original order amount.
Update Orders: Updates each order with the new discounted amount and loyalty points.
Counter and Loop Control: Increments a counter for each processed order and checks if the limit of 100 orders is reached. If reached, the loop exits using LEAVE.

This script is an example of how control flow constructs in BigQuery scripting can be used to implement complex logic that would be cumbersome or impossible to achieve with standard SQL alone. It demonstrates the combination of a FOR loop for batch processing, IF-THEN-ELSE for conditional logic, and loop control mechanisms.

4. Exception Handling:

Scenario: Imagine you have a script that processes data from various tables. However, some tables might not exist or might be temporarily unavailable. You want to ensure that your script can handle such errors without failing entirely.

Solution:

        DECLARE table_names ARRAY<STRING> DEFAULT ['table1', 'table2', 'non_existent_table']
        DECLARE table_name STRING;

        BEGIN
            FOR table_name IN UNNEST(table_names) DO
                BEGIN
                    -- Attempt to process the table
                    TRY
                        EXECUTE IMMEDIATE FORMAT("SELECT COUNT(*) FROM project.dataset.%s", table_name)
                        -- If successful, output the name of the processed tabl
                        SELECT FORMAT("Processed table: %s", table_name) AS result
                    EXCEPTION WHEN ERROR THEN
                        -- If an error occurs (e.g., table does not exist), output an error message instea
                        SELECT FORMAT("Error processing table: %s", table_name) AS error_message
                    END TRY;
                END;
            END FOR;
        END;

Explanation:

Setup: A list of table names is defined, including a table that does not exist (non_existent_table).
FOR Loop: Iterates through each table name in the list.
TRY-CATCH Block Inside the Loop: For each table, the script tries to execute a simple query (counting the number of rows).
Error Handling: If the table does not exist or another error occurs, the script catches the exception. Instead of the script failing, it outputs an error message indicating the problem with the specific table.
Successful Execution Message: If the table is processed without errors, a success message is output.

This example demonstrates how TRY-CATCH blocks can be effectively used in BigQuery scripts to handle exceptions. This ensures that your script can continue running even if some parts of it encounter errors, which is particularly useful in scenarios involving multiple, potentially unreliable operations.