I'm working with the Iowa Liquor Sales dataset which in this case is called "bigquery-public-data.iowa_liquor_sales.sales". Relevant columns and their datatypes are date(DATE), sale_dollars(FLOAT), item_description(STRING), store_name(STRING).
I am trying to write a query that will return the top sale for each year, of the past three years (2021,2020,2019) along with the date, item_description, and store_name.
The below code works, but only covers one year. I know I could copy paste and change the date every time but that seems tedious. Is there a better way?
SELECT date, sale_dollars, item_description, store_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2021-01-01' and '2021-12-31'
ORDER BY sale_dollars DESC
LIMIT 1
date | sale_dollars | item_description | store_name |
---|---|---|---|
2021-04-19 | 250932.0 | Titos Handmade Vodka | Hy-Vee #3 |
When trying different ways to write it so the max sale of 2019,2020, and 2021 return along with their date, item_description, and store_name, I ran into errors. The below is the closest I got (missing date, item_description, and store_name).
SELECT
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2021-01-01' and '2021-12-31') as sale_2021,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2020-01-01' and '2020-12-31') as sale_2020,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2019-01-01' and '2019-12-31') as sale_2019
How can I write a query that returns the max sale of the past three years along with it's date, item, and store name?
CodePudding user response:
Consider below query
SELECT EXTRACT(YEAR FROM date) year,
ARRAY_AGG(
STRUCT(date, sale_dollars, item_description, store_name)
ORDER BY sale_dollars DESC LIMIT 1
)[OFFSET(0)].*
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date BETWEEN '2019-01-01' AND '2021-12-31'
GROUP BY 1;
Query results
------ ------------ -------------- ---------------------- -------------------------------
| year | date | sale_dollars | item_description | store_name |
------ ------------ -------------- ---------------------- -------------------------------
| 2020 | 2020-10-08 | 250932.0 | Titos Handmade Vodka | Hy-Vee #3 / BDI / Des Moines |
| 2019 | 2019-10-08 | 78435.0 | Makers Mark | Hy-Vee Food Store / Urbandale |
| 2021 | 2021-07-05 | 250932.0 | Titos Handmade Vodka | Hy-Vee #3 / BDI / Des Moines |
------ ------------ -------------- ---------------------- -------------------------------
or, you can get same result with a window function
SELECT date, sale_dollars, item_description, store_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date BETWEEN '2019-01-01' AND '2021-12-31'
QUALIFY ROW_NUMBER() OVER (
PARTITION BY EXTRACT(YEAR FROM date) ORDER BY sale_dollars DESC
) = 1;
CodePudding user response:
As the three values deliver only one value, you can add them to the first query, only adapted to three years
SELECT
date, sale_dollars, item_description, store_name,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2021-01-01' and '2021-12-31') as sale_2021,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2020-01-01' and '2020-12-31') as sale_2020,
(SELECT MAX(sale_dollars)
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2019-01-01' and '2019-12-31') as sale_2019
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE date between '2019-01-01' and '2021-12-31'
ORDER BY sale_dollars DESC
LIMIT 1