Navigating Arrays in Google BigQuery

Kazakh Theatre

Introduction

In Google BigQuery, arrays provide a versatile way to store and manipulate sets of values within a single row. Arrays can greatly simplify data structures and are a powerful tool in the hands of SQL practitioners. This article will introduce you to arrays in BigQuery, demonstrate how to work with them, and offer practical examples to enhance your understanding.

Cloud Next'18 Video:

 

 

What is an Array in BigQuery?

An array is a data type in BigQuery that stores an ordered list of elements, all of which have the same data type. Arrays are particularly useful when dealing with multiple values that are naturally related but need to be contained within a single row.

Creating and Storing Arrays

Arrays in BigQuery can be created and populated in several ways. Here's a simple example:

        CREATE TABLE
        project.dataset.table (
            id INT64,
            tags ARRAY<STRING>
        );

This table has a column tags that holds an array of strings. You can insert data into this table as follows:

        INSERT INTO
        project.dataset.table 
        (id, tags)
        VALUES 
        (1, ["bigquery", "sql", "arrays"]);
Querying Arrays

Querying arrays involves understanding how to access the individual elements and manipulate them. You can use the UNNEST function to expand an array into a set of rows:

        SELECT
        id, tag
        FROM
        project.dataset.table,
        UNNEST(tags) AS tag;

This query will return a separate row for each element in the tags array.

Array Functions and Operators

BigQuery provides several functions and operators to work with arrays. For instance, you can find the length of an array using ARRAY_LENGTH():

        SELECT
        ARRAY_LENGTH(tags)
        FROM
        project.dataset.table;
Complex Array Manipulation

Arrays can also be nested, allowing for more complex data structures. You can query nested arrays or even arrays of structs. The key is to use UNNEST at the appropriate level and efficiently access the nested data.

Example:

        SELECT
        id, tag_detail.tag
        FROM
        project.dataset.table,
        UNNEST(tags_detail) AS tag_detail;   

Here, tags_detail could be an array of structs, and you're accessing a field tag within each struct.

Conclusion

Arrays in Google BigQuery offer a flexible way to handle multiple related data items within a single row. By learning how to create, query, and manipulate arrays, you can significantly enhance your SQL querying capabilities in BigQuery. Whether it's simple lists or more complex nested structures, arrays are a potent tool for effective data analysis in BigQuery.

 

Array Function Examples in Google BigQuery:
  1. ARRAY_LENGTH()

    • Returns the number of elements in the array.
    • Example: SELECT ARRAY_LENGTH(tags) FROM project.dataset.table;
  2. ARRAY_CONCAT()

    • Concatenates two or more arrays to form a single array.
    • Example: SELECT ARRAY_CONCAT(array1, array2) FROM project.dataset.table;
  3. ARRAY_TO_STRING()

    • Converts an array into a string, with elements separated by a specified delimiter.
    • Example: SELECT ARRAY_TO_STRING(tags, ', ') FROM project.dataset.table;
  4. ARRAY_REVERSE()

    • Returns an array with the elements in reverse order.
    • Example: SELECT ARRAY_REVERSE(tags) FROM project.dataset.table;
  5. ARRAY_AGG()

    • Aggregates values into an array.
    • Example: SELECT ARRAY_AGG(column_name) FROM project.dataset.table GROUP BY group_column;
  6. UNNEST()

    • Expands an array into a set of rows. Often used in a FROM clause or JOIN.
    • Example: SELECT id, tag FROM project.dataset.table, UNNEST(tags) as tag;
  7. ARRAY_CONTAINS()

    • Returns TRUE if the array contains a certain element.
    • Example: SELECT ARRAY_CONTAINS(tags, 'bigquery') FROM project.dataset.table;
  8. GENERATE_ARRAY()

    • Generates an array of integers from a start value to an end value with an optional step.
    • Example: SELECT GENERATE_ARRAY(1, 5) AS number_array; - creates an array [1, 2, 3, 4, 5].
  9. ARRAY(SELECT ...)

    • Creates an array from a subquery's result.
    • Example: SELECT ARRAY(SELECT column FROM project.dataset.other_table) FROM project.dataset.table;
  10. SAFE_OFFSET() and SAFE_ORDINAL()

    • SAFE_OFFSET(N) returns the element at the zero-based position n, NULL if out of bounds.
    • SAFE_ORDINAL(N) returns the element at the one-based position n, NULL if out of bounds.
    • Example: SELECT tags[SAFE_OFFSET(1)] FROM project.dataset.table;

These functions enable a wide range of operations on arrays in BigQuery, from simple transformations to complex aggregations and expansions. They are essential tools for efficiently handling array data types in your queries.