I have a table structured like this:
day Item Buyer_id
19/10/2022 Shoes 58423401
19/10/2022 Shoes 58423402
19/10/2022 Bikes 58423403
19/10/2022 Shoes 58423404
20/10/2022 Bikes 58423405
20/10/2022 Shoes 58423406
I need display my data like this: In the column on the right, I would like to sum up all sales of that day. This is my desired result, grouped by items:
Day Item number_of_buyers total_number_of_buyers_per_day
19/10/2022 Shoes 5,000 55,000
19/10/2022 Bikes 50,000 55,000
20/10/2022 Shoes 45,000 95,000
20/10/2022 Bikes 50,000 95,000
However, this is what I keep getting:
Day Item number_of_buyers total_number_of_buyers_per_day
19/10/2022 Shoes 5,000 5,000
19/10/2022 Bikes 50,000 50,000
20/10/2022 Shoes 45,000 45,000
20/10/2022 Bikes 50,000 50,000
What I have done so far is this: I tried to get the number of total buyers into my last column with a join.
SELECT
a.day
, a.item
, COUNT (DISTINCT a.buyer_id) AS number_of_buyers
, COUNT(b.number_of_total_users_on_site) AS total_number_of_buyers_per_day
FROM
buyers
LEFT JOIN
(
SELECT day, COUNT (DISTINCT buyer_id) AS number_of_total_buyers
FROM buyers
GROUP BY 1, 2
ORDER BY 1, 2
) AS b
ON a.buyer_id = b.buyer_id
AND a.day = b.day
GROUP BY
1, 2
ORDER BY
1, 2
Thanks for your help!
CodePudding user response:
Here's a global solution. In some rdbms you can count(distinct)
inside a window function and that would make it simpler.
select day
,item
,count(distinct buyer_id) as number_of_buyers
,sum(count(distinct_buyers_by_day)) over(partition by day) as total_number_of_buyers_per_day
from
(
select day
,item
,buyer_id
,case when buyer_id != lag(buyer_id) over(partition by day order by buyer_id) or lag(buyer_id) over(partition by day order by buyer_id) is null then 1 end as distinct_buyers_by_day
from t
) t
group by day, item
day | item | number_of_buyers | total_number_of_buyers_per_day |
---|---|---|---|
2022-10-19 | Bikes | 1 | 4 |
2022-10-19 | Shoes | 3 | 4 |
2022-10-20 | Bikes | 1 | 2 |
2022-10-20 | Shoes | 1 | 2 |