20. Navigating Arrays in Google BigQuery
Navigating Arrays in Google BigQuery
Introduction
In Google BigQuery, arrays provide a versatile way to store and manipulate sets of values within a single row. Arrays can greatly simplify data structures and are a powerful tool in the hands of SQL practitioners. This article will introduce you to arrays in BigQuery, demonstrate how to work with them, and offer practical examples to enhance your understanding.
Cloud Next'18 Video:
What is an Array in BigQuery?
An array is a data type in BigQuery that stores an ordered list of elements, all of which have the same data type. Arrays are particularly useful when dealing with multiple values that are naturally related but need to be contained within a single row.
Creating and Storing Arrays
Arrays in BigQuery can be created and populated in several ways. Here's a simple example:
CREATE TABLE
project.dataset.table (
id INT64,
tags ARRAY<STRING>
);
This table has a column tags
that holds an array of strings. You can insert data into this table as follows:
INSERT INTO
project.dataset.table
(id, tags)
VALUES
(1, ["bigquery", "sql", "arrays"]);
Querying Arrays
Querying arrays involves understanding how to access the individual elements and manipulate them. You can use the UNNEST
function to expand an array into a set of rows:
SELECT
id, tag
FROM
project.dataset.table,
UNNEST(tags) AS tag;
This query will return a separate row for each element in the tags
array.
Array Functions and Operators
BigQuery provides several functions and operators to work with arrays. For instance, you can find the length of an array using ARRAY_LENGTH()
:
SELECT
ARRAY_LENGTH(tags)
FROM
project.dataset.table;
Complex Array Manipulation
Arrays can also be nested, allowing for more complex data structures. You can query nested arrays or even arrays of structs. The key is to use UNNEST
at the appropriate level and efficiently access the nested data.
Example:
SELECT
id, tag_detail.tag
FROM
project.dataset.table,
UNNEST(tags_detail) AS tag_detail;
Here, tags_detail
could be an array of structs, and you're accessing a field tag
within each struct.
Conclusion
Arrays in Google BigQuery offer a flexible way to handle multiple related data items within a single row. By learning how to create, query, and manipulate arrays, you can significantly enhance your SQL querying capabilities in BigQuery. Whether it's simple lists or more complex nested structures, arrays are a potent tool for effective data analysis in BigQuery.
Array Function Examples in Google BigQuery:
-
ARRAY_LENGTH()
- Returns the number of elements in the array.
- Example:
SELECT ARRAY_LENGTH(tags) FROM project.dataset.table;
-
ARRAY_CONCAT()
- Concatenates two or more arrays to form a single array.
- Example:
SELECT ARRAY_CONCAT(array1, array2) FROM project.dataset.table;
-
ARRAY_TO_STRING()
- Converts an array into a string, with elements separated by a specified delimiter.
- Example:
SELECT ARRAY_TO_STRING(tags, ', ') FROM project.dataset.table;
-
ARRAY_REVERSE()
- Returns an array with the elements in reverse order.
- Example:
SELECT ARRAY_REVERSE(tags) FROM project.dataset.table;
-
ARRAY_AGG()
- Aggregates values into an array.
- Example:
SELECT ARRAY_AGG(column_name) FROM project.dataset.table GROUP BY group_column;
-
UNNEST()
- Expands an array into a set of rows. Often used in a
FROM
clause orJOIN
. - Example:
SELECT id, tag FROM project.dataset.table, UNNEST(tags) as tag;
- Expands an array into a set of rows. Often used in a
-
ARRAY_CONTAINS()
- Returns
TRUE
if the array contains a certain element. - Example:
SELECT ARRAY_CONTAINS(tags, 'bigquery') FROM project.dataset.table;
- Returns
-
GENERATE_ARRAY()
- Generates an array of integers from a start value to an end value with an optional step.
- Example:
SELECT GENERATE_ARRAY(1, 5) AS number_array;
- creates an array[1, 2, 3, 4, 5]
.
-
ARRAY(SELECT ...)
- Creates an array from a subquery's result.
- Example:
SELECT ARRAY(SELECT column FROM project.dataset.other_table) FROM project.dataset.table;
-
SAFE_OFFSET() and SAFE_ORDINAL()
SAFE_OFFSET(N)
returns the element at the zero-based positionn
,NULL
if out of bounds.SAFE_ORDINAL(N)
returns the element at the one-based positionn
,NULL
if out of bounds.- Example:
SELECT tags[SAFE_OFFSET(1)] FROM project.dataset.table;
These functions enable a wide range of operations on arrays in BigQuery, from simple transformations to complex aggregations and expansions. They are essential tools for efficiently handling array data types in your queries.