I'm trying to do something similar to
What I want as result is something like this:
That is, for example, only one user has only category D (and no other), two users have categories 10 and 30, and so on.
The main problem is that I have a lot of categories (over 40). Previously I had done something like:
SELECT sum(cat1), sum(cat2), sum(cat3)
FROM table
where cat1 = 0 and cat2 = 1 and cat3 = 0
That way worked, but is too manual and impossible to do it right now because I have lots of categories. Want to use BigQuery if it is possible.
CodePudding user response:
FWIW:
with mytable as (
select 'D' as Usr, '10' as Categories union all
select 'E', '10' union all
select 'E', '30' union all
select 'F', '30' union all
select 'G', '10' union all
select 'G', '50' union all
select 'H', '10' union all
select 'H', '30'
)
select grp, count(*) as cnt
from (
select Usr, string_agg(Categories order by Categories) as grp
from mytable
group by Usr
)
group by grp
CodePudding user response:
The main problem is that I have a lot of categories (over 40).
Consider below (BigQuery) approach - works for any reasonable amount of categories
execute immediate (
select '''
select * from (
select distinct t1.usr,
t1.categories category, t2.categories category2
from `your_table` t1 left join `your_table` t2
on t1.usr = t2.usr and t1.categories != t2.categories
union all
select usr, any_value(categories) category, any_value(categories) category2
from `your_table`
group by usr
having count(1) = 1
)
pivot (count(usr) cat for category2 in (''' || list || '''))
order by category
'''
from (
select string_agg("'" || categories || "'" order by categories) list
from (select distinct categories from `your_table`)
)
)
if applied to sample data in your question - output is
CodePudding user response:
This isn't exactly what you're looking for but you can use this output as a source to an Excel Pivot or BI tool to get exactly what you want. Performing pivots on 40 columns in SQL is