Data analysts frequently need to write SQL queries involving dates. In this lesson, we will explore various date functions. As always, we'll learn through hands-on tasks.
Task #1: Assess Shipping Efficiency by Comparing Order and Ship Dates
In today's competitive business environment, shipping efficiency is a vital metric for customer satisfaction and operational success. This task focuses on evaluating the time taken to ship orders by comparing the order and ship dates. By classifying the shipping duration into categories, we can better understand and improve our shipping processes.
Objective: Analyze the shipping efficiency by calculating the time difference between the Order Date and Ship Date, and categorize each shipment as 'Fast,' (up to 2 days) 'Medium,' (3-5 days) or 'Slow' (more than 5 days) based on the number of days taken for shipping.
Expected Result: The analysis will categorize shipments into three categories of shipping speed, providing valuable insights into the performance and efficiency of the shipping process.
Data Source:dataacademykz.superstore.sales_analytics
SELECT
Order_ID,
Order_Date,
Ship_Date,
DATE_DIFF(Ship_Date, Order_Date, DAY) AS Shipping_Days,
CASEWHENDATE_DIFF(Ship_Date, Order_Date, DAY) <= 2 THEN 'Fast'
WHENDATE_DIFF(Ship_Date, Order_Date, DAY) BETWEEN 3 AND 5 THEN 'Medium'
ELSE
'Slow'
ENDAS Shipping_Speed
FROM
dataacademykz.superstore.sales_analytics
Explanation:
Imagine you’re running a delivery service, and you want to make sure your customers are getting their orders quickly. This query is like checking a stopwatch for each order – how long did it take from the time the order was placed until it was shipped out?
We're timing each order's journey and then labeling the delivery speed: super quick (Fast), average (Medium), or a bit slow (Slow).
This way, you can see which orders zoomed out the door and which ones took their time, helping you figure out where to speed things up.
Technical Breakdown:
This SQL query is tailored to evaluate shipping efficiency by measuring the time taken between order placement and shipment. Here's the technical breakdown:
Date Difference Calculation: The DATE_DIFF(Ship_Date, Order_Date, DAY) AS Shipping_Days expression calculates the number of days between the Order_Date and Ship_Date for each order.
Categorization: The CASE statement categorizes the shipping duration into three categories based on the calculated number of days:
Orders shipped in 2 days or less are categorized as 'Fast'.
Orders shipped in 3 to 5 days are categorized as 'Medium'.
Orders taking more than 5 days to ship are categorized as 'Slow'.
Result: The output includes the Order_ID, Order_Date, Ship_Date, number of shipping days, and the categorized shipping speed for each order.
Task #2: Analyze Sales by Sub-Category within the Technology Category
In today's data-driven environment, it's crucial for data analysts to dissect sales data for specific insights. We aim to understand the sales trends for each sub-category within 'Technology' across different time periods.
Objective: Calculate the total sales for each sub-category within the 'Technology' category, segmented by year and month.
Expected Result: The analysis will provide insights into the performance of different sub-categories in the 'Technology' category over time, helping to identify trends, seasonal variations, and growth patterns.
Data Source:dataacademykz.superstore.sales_analytics
SELECTEXTRACT(YEAR FROM Order_Date) AS Order_Year,
EXTRACT(MONTH FROM Order_Date) AS Order_Month,
Sub_Category,
SUM(Sales) AS Total_Sales
FROM
dataacademykz.superstore.sales_analytics
WHERE
Category = 'Technology'
GROUP BY
Order_Year, Order_Month, Sub_Category
ORDER BY
Order_Year, Order_Month, Sub_Category
Explanation:
Think of this query as creating a detailed report card for technology products in your store, showing how well each type of product (sub-category) sold, month by month, year by year.
We’re focusing on just the technology section – like computers, gadgets, etc.
For each type of tech product, we add up all the sales for each month of each year.
Then, we lay it all out in an easy-to-read format, showing which products sold the most, when they sold best, and spotting any interesting patterns, like holiday sales spikes.
Technical Breakdown:
This SQL query is designed to explore sales trends for various sub-categories under the 'Technology' category, broken down by year and month.
Date Extraction and Aggregation: The query uses EXTRACT(YEAR FROM Order_Date) AS Order_Year and EXTRACT(MONTH FROM Order_Date) AS Order_Month to separate the order dates into years and months. It then computes the sum of sales for each sub-category (SUM(Sales) AS Total_Sales) within these time segments.
Filtering and Grouping: The WHERE Category =
'Technology' clause focuses the analysis on the 'Technology' category. The GROUP BY Order_Year, Order_Month,
Sub_Category clause ensures that sales are aggregated and analyzed for each sub-category within each time period.
Sorting: The ORDER BY Order_Year, Order_Month,
Sub_Category clause arranges the results chronologically and then by sub-category, facilitating an easy interpretation of the sales trends over time.
Result: The output provides a detailed breakdown of sales by sub-category in the 'Technology' category, segmented by year and month, revealing sales performance and trends.
In Google BigQuery, the EXTRACT function is used to retrieve specific parts of a date, time, datetime, or timestamp field. This function is particularly useful for breaking down a date or time into its components, such as the year, month, day, etc., which can be crucial for data analysis and reporting. The function works by specifying the part you want to extract and the date/time field from which to extract it.
Syntax:
EXTRACT(part FROM date_column)
Extracts the selected part from a date or timestamp
Task #3: Analyze Sales by Sub-Category within the Technology Category using the DATETIME_TRUNC Function
In data analysis, especially when dealing with time-based data, tools like the DATETIME_TRUNC function in SQL can greatly simplify and enhance the analysis process. This function allows us to truncate detailed datetime data to a specific level (like year, month, or day), making it easier to analyze trends over time. This task involves using DATETIME_TRUNC to analyze sales in the 'Technology' category on a monthly basis.
Objective: Utilize the DATETIME_TRUNC function to calculate and analyze the monthly sales totals for each sub-category within the 'Technology' category.
Expected Result: The analysis will provide a clearer understanding of the monthly sales trends for each sub-category in the 'Technology' category.
Data Source:dataacademykz.superstore.sales_analytics
SELECTFORMAT_DATETIME('%Y-%m', DATETIME_TRUNC(Order_Date, MONTH)) AS Truncated_Month,
Sub_Category,
SUM(Sales) AS Total_Sales
FROM
dataacademykz.superstore.sales_analytics
WHERE
Category = 'Technology'
GROUP BY
Truncated_Month, Sub_Category
ORDER BY
Truncated_Month, Sub_Category
Explanation:
Imagine you’re flipping through a calendar, marking down how much of each type of tech product (like laptops, phones, etc.) you sold each month. The DATETIME_TRUNC function is like a highlighter that helps you focus only on the month, not the specific days.
For each type of technology product, we add up all the sales for each month.
Then, we list them, month by month, showing you a clear picture of what’s selling when in the tech department.
Technical Breakdown:
This SQL query employs the DATETIME_TRUNC function to simplify the analysis of sales data over time, specifically targeting monthly sales in the 'Technology' category. Here's the technical breakdown:
Use of DATETIME_TRUNC: The DATETIME_TRUNC(Order_Date, MONTH) function truncates the Order_Date to the first day of the month, effectively grouping data by month. The FORMAT_DATETIME('%Y-%m', ...) then formats this date to a more readable year-month format (e.g., '2021-07').
Aggregation and Grouping: The query computes the sum of sales (SUM(Sales) AS Total_Sales) for each sub-category, grouped by the truncated month. The GROUP BY
Truncated_Month,
Sub_Category clause ensures that sales are aggregated for each sub-category within each monthly period.
Sorting: The ORDER BY Truncated_Month,
Sub_Category clause organizes the results chronologically by month and then by sub-category, allowing for a clear understanding of sales trends over time.
Result: The output provides a streamlined and chronological view of monthly sales figures by sub-category in the 'Technology' category, facilitating the analysis of sales patterns and trends.
Task #4: Comparative Analysis of Weekday vs. Weekend Sales
Understanding consumer behavior requires dissecting sales patterns, which often show significant differences between weekdays and weekends. This task aims to provide insights into these variations by categorizing sales based on the day of the week and analyzing the average sales for weekdays and weekends.
Objective: Differentiate sales based on whether they occur on weekdays or weekends and calculate the average sales for these two categories.
Expected Result: The analysis will reveal the average sales figures for weekday and weekend sales, providing a clearer understanding of consumer purchasing patterns during these periods.
Data Source:dataacademykz.superstore.sales_analytics
SELECTCASEWHENEXTRACT(DAYOFWEEK FROM Order_Date) IN (1, 7) THEN 'Weekend'
ELSE
'Weekday'
ENDAS Day_Type,
ROUND(AVG(Sales), 1) AS Average_Sales
FROM
dataacademykz.superstore.sales_analytics
GROUP BY
Day_Type
Explanation:
Think of this query as analyzing when people tend to shop more at your store: during the busy weekdays or the relaxed weekends.
We look at each sale and categorize it based on when it happened: either a weekday or a weekend.
Then, we calculate the average amount spent on purchases for each of these two categories.
It’s like taking a snapshot of shopping habits to see if people spend more on lazy Sundays or during their weekday routines.
Technical Breakdown:
This SQL query performs an analysis to compare average sales figures between weekdays and weekends, utilizing the DAYOFWEEK function. Here's the technical breakdown:
Use of DAYOFWEEK and CASE: The query uses EXTRACT(DAYOFWEEK FROM Order_Date) to determine the day of the week for each order date. The CASE statement then categorizes each sale as 'Weekend' (for days 1 (Sunday) and 7 (Saturday)) or 'Weekday' (for days 2 to 6).
Aggregation and Grouping: The ROUND(AVG(Sales),
1) AS Average_Sales computes the average sales for each category ('Weekday' or 'Weekend'). The GROUP BY Day_Type clause ensures that sales are aggregated and averaged separately for weekdays and weekends.
Result: The output provides the average sales figures for weekdays and weekends, offering insights into consumer purchasing behavior during these distinct time periods.