My goal is to select 1 category each month with the most quantity sold.
The code:
client.query('''SELECT year,
month,
category,
num_of_product
FROM (SELECT EXTRACT(year from created_at) as year,
EXTRACT(month from created_at) as month,
b.category,
COUNT(b.category) AS num_of_product,
FROM `bigquery-public-data.thelook_ecommerce.order_items` as a
INNER JOIN `bigquery-public-data.thelook_ecommerce.products` as b
ON a.product_id=b.id
WHERE status='Complete'
AND created_at BETWEEN "2022-01-01" AND "2022-09-30"
GROUP BY year, month, b.category)
WHERE month=1
ORDER BY year ASC, month ASC, num_of_product DESC''') \
.to_dataframe()
What I've tried:
- Use 3 nested query with PARTITION BY. The problem is, I only can get the maximum quantity sold per month. I can't display which category that have
132
quantity sold.
client.query('''SELECT *
FROM (SELECT year,
month,
MAX(num_of_product) OVER (PARTITION BY year, month) as num_of_productz
FROM (SELECT EXTRACT(year from created_at) as year,
EXTRACT(month from created_at) as month,
b.category,
COUNT(b.category) AS num_of_product,
FROM `bigquery-public-data.thelook_ecommerce.order_items` as a
INNER JOIN `bigquery-public-data.thelook_ecommerce.products` as b
ON a.product_id=b.id
WHERE status='Complete'
AND created_at BETWEEN "2022-01-01" AND "2022-09-30"
GROUP BY year, month, b.category)
GROUP BY year, month, num_of_product)
GROUP BY year, month, num_of_productz
ORDER BY year ASC, month ASC''') \
.to_dataframe()
Anyway, maybe I did something wrong. I will rewrote the code with PARTITION BY after I take my laundry.
The expected result:
year | month | category | num_of_product
2022 | 1 | Intimates | 132
The actual result:
year | month | num_of_product
2022 | 1 | 132
CodePudding user response:
this query should do it for you. The result contains a "part_max" column which is just the maximum num_of_product per "year/month" combination. And the "part" column just represents if that row is the maximum or not per "year/month" combination.
SELECT *
FROM
(SELECT *,
DENSE_RANK() over (partition by year, month order by num_of_product desc) part,
MAX(num_of_product) over (partition by year, month) part_max
FROM
(SELECT
EXTRACT(year FROM created_at) AS year,
EXTRACT(month FROM created_at) AS month,
b.category,
COUNT(b.category) AS num_of_product,
FROM `bigquery-public-data.thelook_ecommerce.order_items` AS a
INNER JOIN `bigquery-public-data.thelook_ecommerce.products` AS b
ON a.product_id=b.id
WHERE status='Complete'
AND created_at BETWEEN "2022-01-01"
AND "2022-09-30"
GROUP BY
year,
month,
b.category) t
) t
WHERE part = 1
ORDER BY year, month
Output is: