This task aims to uncover the most popular titles within each genre, offering valuable insights into customer preferences and market trends. By highlighting the top sellers across various categories, the bookstore can tailor its inventory and marketing strategies to meet customer demand effectively.
Objective: Identify the top three best-selling books in each literary genre at your bookstore, such as mystery, romance, or science fiction.
Expected Result: A list of the top three best-selling books for each genre, including their titles, genres, total sales, and sales rank within the genre.
Data Source: dataacademykz.bookstore.{table_name}
-- CTE for calculating total sales per book genre and ranking them WITH
GenreSales AS (
SELECT
b.genre, -- Genre of the book
b.title, -- Title of the bookROUND(SUM(od.quantity * od.price_per_unit), 1) AS total_sales, -- Calculating total sales for each titlRANK() OVER (PARTITION BY b.genre ORDER BYSUM(od.quantity * od.price_per_unit) DESC) AS rank, -- Ranking books within each genrFROM
`dataacademykz.bookstore.books` b
JOIN
`dataacademykz.bookstore.order_details` od
ON b.book_id = od.book_id
GROUP BY b.genre, b.title
) SELECT
genre, -- Genre of the bookCONCAT('Rank: ', rank, '. ', title) AS book_title_with_rank, -- Concatenating rank and title
total_sales -- Total sales of the bookFROM
GenreSales -- Referencing the CTEWHERE
rank <= 3 -- Filtering to top 3 books in each genre;
Explanation:
Imagine organizing a book competition in your bookstore, where books compete to be the best-sellers in their genre, like mystery, romance, or science fiction.
We calculate how much money each book made.
Then, for each genre, we give awards to the top three books – the ones that sold the most.
We end up with a list showing the winners in each genre, like “the top three mystery books” or “the top three romance novels.”
It’s like giving out medals to the best-selling books in each category of your bookstore.
Technical Breakdown:
This SQL query is structured to identify and rank the top three best-selling books within each genre based on their sales. Here's the technical breakdown:
First Step (CTE - GenreSales):
The GenreSales CTE calculates the total sales for each book within its genre.
The SUM(od.quantity * od.price_per_unit) computes the total sales for each book.
The RANK() OVER (PARTITION BY b.genre ORDER BY
SUM(od.quantity
*
od.price_per_unit) DESC) AS rank assigns a sales-based rank to each book within its respective genre.
Filtering and Final Selection:
The final SELECT statement retrieves the genre, a concatenated string of rank and title (CONCAT('Rank: ', rank,
'.
', title) AS book_title_with_rank), and total sales.
The WHERE rank <= 3 clause filters the results to only include the top three books in each genre.
Result:
The output is a list of the top three best-selling books in each genre, ranked by their total sales.
Task #2: Creating a Pivot Table of Customer Age Groups Split by Order Year
In this task, we'll utilize the PIVOT function in Google BigQuery, which is akin to creating pivot tables in Excel. The pivot table will arrange customers into age groups across columns while breaking down these groups by order year in rows.
Objective: Develop a pivot table in Google BigQuery that categorizes customers into different age groups and analyzes these groups based on the year they placed orders. This task is aimed at understanding customer behavior across various age demographics over time.
Expected Result: A pivot table that displays the distribution of customer orders across different age groups for each order year, providing insights into the purchasing patterns of various age demographics over time.
Note: For example, in 2020 customers aged 18-30 made 128 orders in total
Data Source: dataacademykz.bookstore.{table_name}
-- CTE to calculate order counts grouped by year and customer age group WITH OrderCounts AS (
SELECTEXTRACT(YEAR FROM o.order_date) AS year, -- Extracting year from order datCASEWHENEXTRACT(YEAR FROM CURRENT_DATE()) - EXTRACT(YEAR FROM c.date_of_birth) BETWEEN 18 AND 30 THEN '18-30' -- Classifying as '18-30WHENEXTRACT(YEAR FROM CURRENT_DATE()) - EXTRACT(YEAR FROM c.date_of_birth) BETWEEN 31 AND 45 THEN '31-45' -- Classifying as '31-45WHENEXTRACT(YEAR FROM CURRENT_DATE()) - EXTRACT(YEAR FROM c.date_of_birth) BETWEEN 46 AND 60 THEN '46-60' -- Classifying as '46-60ELSE '60+' -- Classifying as '60+'ENDAS age_group,
COUNT(DISTINCT(o.order_id)) AS order_count -- Counting ordersFROM
`dataacademykz.bookstore.customers` c
JOIN
`dataacademykz.bookstore.orders` o ON c.customer_id = o.customer_id
GROUP BY year, age_group
) SELECT * -- Selecting all fieldsFROM OrderCounts
PIVOT (
SUM(order_count) FOR age_group IN ('18-30', '31-45', '46-60', '60+')
) ORDER BY year -- Ordering by year;
Explanation:
Imagine you’re trying to understand who’s buying the most books from your store each year, sorted by how old they are.
We first group your customers into age categories like '18-30', '31-45', and so on.
Then, for each year, we count how many orders were made by customers in each age group.
We arrange this info like a table – each row is a year, and each column shows one age group.
It’s like creating a yearbook that tells you which age groups were most into buying books each year.
Technical Breakdown:
This SQL query is designed to evaluate the number of orders placed by different age groups of customers for each year. Here's the technical breakdown:
First Step (CTE - OrderCounts):
The OrderCounts CTE calculates the number of orders for each year, segmented into different customer age groups.
The CASE statement categorizes customers into age groups ('18-30', '31-45', '46-60', '60+') based on their age, which is calculated from their date of birth.
COUNT(o.order_id) AS order_count computes the number of orders for each age group in each year.
PIVOT Operation:
The PIVOT clause transforms the age group data into a more readable format where each age group has its column, and each row represents a year.
SUM(order_count) FOR age_group IN ('18-30', '31-45',
'46-60',
'60+') aggregates the order counts for each age group.
Final Query:
The main query selects all from OrderCounts and applies the PIVOT operation to organize the data by year and age group.
Result:
The output includes a year-wise breakdown of order counts, categorized by customer age groups, providing a clear view of the ordering patterns across different demographics.
Task #3: Analyzing authors' contributions to total genre sales
This task involves analyzing how much each author's books contribute to the total sales of their genre. By determining the percentage of genre sales attributable to each author, we can identify key contributors and understand their impact on the genre's market performance.
Objective: Evaluate the contribution of each author to the total sales within their respective book genres.
Expected Result: A detailed analysis showing each author's contribution to the total sales of their book genre, expressed as a percentage of the genre's total sales.
Data Source: dataacademykz.bookstore.{table_name}
-- Selecting author names, genres, and their sales as a percentage of total genre sales SELECT
a.name AS author_name, -- Author's name
b.genre, -- Genre of the bookROUND(SUM(od.quantity * od.price_per_unit) / g.genre_total_sales * 100, 2) AS percentage_of_genre_sales, -- Percentage of total sales in the genrFROM
`dataacademykz.bookstore.authors` a -- Authors tableJOIN
`dataacademykz.bookstore.books` b ON a.author_id = b.author_id -- Joining books with authorJOIN
`dataacademykz.bookstore.order_details` od ON b.book_id = od.book_id -- Joining order details with bookJOIN (
SELECT
genre,
SUM(quantity * price_per_unit) AS genre_total_sales -- Calculating total sales per genrFROM
`dataacademykz.bookstore.books` b
JOIN
`dataacademykz.bookstore.order_details` od ON b.book_id = od.book_id
GROUP BY genre
) g ON b.genre = g.genre -- Joining with subquery for total genre saleGROUP BY -- Grouping by author name, genre, and total genre sale
a.name, b.genre, g.genre_total_sales
ORDER BY
b.genre, percentage_of_genre_sales DESC-- Ordering by genre and descending percentage of genre sales
Explanation:
Imagine you’re analyzing a bookstore's sales and want to know how much each author contributes to the total sales of their book genre.
We first calculate how much money each genre made altogether.
Then, we figure out what percentage of these total sales each author contributed with their books.
The list shows each author with the part they played in their genre's sales, like “this author made up 5% of all romance book sales.”
It's like seeing which authors are the stars in their genre's sky, based on how much they sell.
Technical Breakdown:
This SQL query aims to calculate the contribution of each author's sales to the total sales of their respective genres. Here's the technical breakdown:
Join Operations:
The query joins the authors, books, and order_details tables to link each author to their books and corresponding sales data.
A subquery (aliased as g) calculates the total sales for each genre.
Subquery for Genre Sales:
The subquery aggregates sales data (SUM(quantity *
price_per_unit)) for each genre, providing a benchmark for comparing individual author sales.
Percentage Calculation:
The main query calculates each author's sales as a percentage of the total sales in their genre. This is done using ROUND(SUM(od.quantity * od.price_per_unit) /
g.genre_total_sales
*
100, 2).
Grouping and Ordering:
The GROUP BY clause groups the data by author name and genre, essential for accurate aggregation.
The ORDER BY clause sorts the results first by genre and then by the percentage contribution of each author within that genre.
Result:
The output includes each author's name, their book genre, and their sales as a percentage of total sales within that genre, providing insights into each author's market impact within their genre.