I have the following data structure:
table user
user_id | year_month | fruit
------------------------------
1 | 2021-01 | orange
1 | 2021-01 | apple
1 | 2021-01 | melon
1 | 2021-01 | orange
1 | 2021-02 | kiwi
1 | 2021-02 | orange
1 | 2021-02 | banana
1 | 2021-02 | melon
1 | 2021-02 | orange
1 | 2021-03 | melon
1 | 2021-03 | orange
1 | 2021-03 | melon
1 | 2021-03 | banana
1 | 2021-03 | apple
I want to generate this output
user_id | year_month | fruits | most_frequent
---------------------------------------------------------------------
1 | 2021-01 | orange, apple, melon | orange
1 | 2021-02 | kiwi, orange, banana, melon | orange
1 | 2021-03 | melon, orange, banana, apple | orange
It is basically a grouping of the fruits consumed by year_month
. Notice that the last column it is the most consumed fruit of all the time and not only the month.
This is my code so far:
WITH user AS (
-- logic to retrieve the results from 2 other tables and aggregate into 1 temporary table
-- contains a function inside
), ranking AS (
SELECT
user_id,
fruit,
COUNT(*) AS most_frequent,
FROM
user
GROUP BY
1, 2
ORDER BY
3 desc
-- LIMIT 1
)
SELECT
user.user_id,
user.year_month,
STRING_AGG(distinct user.fruit, ', ') as fruits,
FROM user
LEFT JOIN ranking
ON user.user_id = ranking.user_id
WHERE user.user_id = 1
GROUP BY 1
, 2
ORDER BY 2
and this is my output:
user_id | year_month | fruits
-----------------------------------------------------
1 | 2021-01 | orange, apple, melon
1 | 2021-02 | kiwi, orange, banana, melon
1 | 2021-03 | melon, orange, banana, apple
I'm having problems with the most frequent column. My initial solution was to uncomment that limit 1
line in the WITH ranking AS
and include ranking.fruit
into the main select, but this way I got no data returned.
To be more confusing for me, if I run this ranking
code outside the with
clause - including with the limit
line, I got the desired grouping I mentioned in the 'initial solution'. So the problem seems to be related to some join question.
CodePudding user response:
Consider below approach
select * from (
select user_id, year_month,
string_agg(distinct fruit) as fruits
from user
group by user_id, year_month
) join (
select user_id, fruit
from user
group by user_id, fruit
qualify 1 = row_number() over(partition by user_id order by count(*) desc)
)
using (user_id)
if applied to sample data in your question - output is