I have a table like below. Members can have purchases under mutliple categories. I want to get all category combinations for members who have had purchases in more than one categeory. That is, I want to get all combinations and their counts. For example, members A,B,C,F have had purhcase in three categories; member D only in one, and member E is in two categories.
category member quantity
Clothes A 1
Clothes B 2
Clothes C 3
Clothes D 1
Cards A 1
Cards B 1
Cards C 2
Cards E 3
Cards F 3
Trips A 1
Trips B 2
Trips C 2
Trips F 1
Dining E 2
Dining F 1
The output I want is like this:
Categories mem quant_total
Clothes Cards Trips 3 15 ==> members A B C
Cards Trips Dining 1 5 ==> Only member F
Cards Dining 1 5 ==> Only member E
I can get count of categories for members who appeared in multiple categories, but don't know how to get the above results.
SELECT category, count(category)
FROM table
WHERE member IN (SELECT member FROM table
GROUP BY member
HAVING count(distinct category) > 1 )
GROUP BY category
Plz note that category combinations can be larger like 4,5,6..., and this is a toy example.
CodePudding user response:
We'll start with the data.
Please provide the DDL for this next time. It is friendlier to those providing you the free labor.
with "table" (category, member, quantity) as (
select 'Clothes', 'A', 1
union select 'Clothes', 'B', 2
union select 'Clothes', 'C', 3
union select 'Clothes', 'D', 1
union select 'Cards', 'A', 1
union select 'Cards', 'B', 1
union select 'Cards', 'C', 2
union select 'Cards', 'E', 3
union select 'Cards', 'F', 3
union select 'Trips', 'A', 1
union select 'Trips', 'B', 2
union select 'Trips', 'C', 2
union select 'Trips', 'F', 1
union select 'Dining', 'E', 2
union select 'Dining', 'F', 1
),
Then we aggregate the cagetories and get the counts
For PostgreSQL:
counts as (
select member as mem
, STRING_AGG ( category, ', ' order by category ) as categories
, count(distinct category) as category_count
, sum(quantity) as quant_total
from "table"
group by member
)
or for Redshift:
counts as (
select member as mem
, LISTAGG( category , ',' ) WITHIN GROUP (ORDER BY category) as categories
, count(distinct category) as category_count
, sum(quantity) as quant_total
from "table"
group by member
)
Then we get the results:
select categories
, count(distinct mem) as mem
, sum(quant_total) as quant_total
from counts
where category_count > 1
group by categories
order by 1
I couldn't test the redshift code, but I tested this on an Oracle fiddle and it works.
with "table" (category, member, quantity) as (
select 'Clothes', 'A', 1 from dual
union select 'Clothes', 'B', 2 from dual
union select 'Clothes', 'C', 3 from dual
union select 'Clothes', 'D', 1 from dual
union select 'Cards', 'A', 1 from dual
union select 'Cards', 'B', 1 from dual
union select 'Cards', 'C', 2 from dual
union select 'Cards', 'E', 3 from dual
union select 'Cards', 'F', 3 from dual
union select 'Trips', 'A', 1 from dual
union select 'Trips', 'B', 2 from dual
union select 'Trips', 'C', 2 from dual
union select 'Trips', 'F', 1 from dual
union select 'Dining', 'E', 2 from dual
union select 'Dining', 'F', 1 from dual
),
counts as (
select member as mem
, listagg ( category, ', ') WITHIN GROUP (ORDER BY category ) as categories
, count(distinct category) as category_count
, sum(quantity) as quant_total
from "table"
group by member
)
select categories
, count(distinct mem) as mem
, sum(quant_total) as quant_total
from counts
where category_count > 1
group by categories
order by 1
CodePudding user response:
OK Here's my solution:
SELECT Categories, count(member), sum(memcat_total)
FROM (
SELECT group_concat(category ORDER BY category SEPARATOR " " ) AS Categories,
member,
sum(quantity) as memcat_total
FROM table
GROUP BY member
ORDER by categories
) AS SQL1
GROUP BY Categories;
Which gives the following:
EDIT SQL Modified to Postgres syntax (untested)
SELECT Categories, count(member), sum(memcat_total)
FROM (
SELECT ARRAY_AGG(category || ' ' ORDER BY category) AS Categories,
member,
sum(quantity) as memcat_total
FROM table
GROUP BY member
ORDER by categories
) AS SQL1
GROUP BY Categories;
EDIT 2: My attempt at a Redshift version:
SELECT Categories, count(member), sum(memcat_total)
FROM (
SELECT LISTAGG(category,' ') within group (order by category) AS Categories,
member,
sum(quantity) as memcat_total
FROM table
GROUP BY member
ORDER by categories
) AS SQL1
GROUP BY Categories;