Home > Mobile >  Find Overlapping within categories in SQL Big Query
Find Overlapping within categories in SQL Big Query

Time:09-30

I'm trying to do something similar to enter image description here

What I want as result is something like this:

enter image description here

That is, for example, only one user has only category D (and no other), two users have categories 10 and 30, and so on.

The main problem is that I have a lot of categories (over 40). Previously I had done something like:

SELECT sum(cat1), sum(cat2), sum(cat3)
FROM  table
where cat1 = 0 and cat2 = 1 and cat3 = 0

That way worked, but is too manual and impossible to do it right now because I have lots of categories. Want to use BigQuery if it is possible.

CodePudding user response:

FWIW:

with mytable as (
    select 'D' as Usr, '10' as Categories union all 
    select 'E', '10' union all
    select 'E', '30' union all
    select 'F', '30' union all
    select 'G', '10' union all
    select 'G', '50' union all
    select 'H', '10' union all
    select 'H', '30'
)
select grp, count(*) as cnt
from (
    select Usr, string_agg(Categories order by Categories) as grp
    from mytable
    group by Usr
)
group by grp

enter image description here

CodePudding user response:

The main problem is that I have a lot of categories (over 40).

Consider below (BigQuery) approach - works for any reasonable amount of categories

execute immediate (
select '''
  select * from (
    select distinct t1.usr, 
      t1.categories category, t2.categories category2
    from `your_table` t1 left join `your_table` t2 
    on t1.usr = t2.usr and t1.categories != t2.categories
    union all
    select usr, any_value(categories) category, any_value(categories) category2
    from `your_table`
    group by usr
    having count(1) = 1
  )
  pivot (count(usr) cat for category2 in (''' || list || '''))
  order by category
'''
from (
  select string_agg("'" || categories || "'" order by categories) list 
  from (select distinct categories from `your_table`)
  )
)     

if applied to sample data in your question - output is

enter image description here

CodePudding user response:

This isn't exactly what you're looking for but you can use this output as a source to an Excel Pivot or BI tool to get exactly what you want. Performing pivots on 40 columns in SQL is enter image description here

  • Related