My table looks like the following:
user_id | group | probability |
---|---|---|
123 | group1 | 0.9 |
123 | group2 | 0.6 |
45 | group2 | 0.8 |
567 | group2 | 0.56 |
567 | group3 | 0.78 |
567 | group1 | 0.90 |
I need to extract users who are just in group 1 and 2, which means I only need to retrieve user 123. I have written my query like the following:
with two_groups as (
select user_id
from table1
where group in ('group1', 'group2')
group by 1
having max(group) <> min(group) and count(user_id) = 2
)
select *
from two_groups
join table1 using (user_id)
The reason that I am joining it back to the table1 is because I could not add the group and probability columns as a field in the "two_groups" subquery because I didn't want to group by them.
So, the problem is with the query I have, it still retrieves user id 567. However, I don't want it to be extracted because it is in group3 as well. What Can I do to just extract the users being in just exactly two groups?
Thank you!
CodePudding user response:
You should try with a proper inner join
select *
from
two_groups t1 inner join
table1 t2 on t1.user_id = t2.user_id
CodePudding user response:
Perhaps:
with two_groups as (
select user_id
from table1
group by 1
having min(group) = 'group1' and
max(group) = 'group2' and
count(distinct group) = 2
)
select *
from two_groups
join table1 using (user_id)
This method wouldn't extend to "member of exactly three groups" though.
Also:
select *
from table1 t1
where exists (select 1 from table1 t2 where t2.user_id = t1.user_id and t2.group = 'group1')
and exists (select 1 from table1 t2 where t2.user_id = t1.user_id and t2.group = 'group2')
and not exists (select 1 from table1 t2 where t2.user_id = t1.user_id and t2.group not in ('group1', 'group2'))
CodePudding user response:
You can join what you have with select count(), user_id from table group by user_id having count() = 2. That will give you those who are in exactly 2 groups.
CodePudding user response:
Would this work? This first query is your original CTE, and should get all users that have a row in both groups The second query removed any users that also have a row in a group other than 1 or 2.
select user_id
from table1
where group in ('group1', 'group2')
group by user_id
having max(group) <> min(group)
and count(user_id) = 2
EXCEPT DISTINCT
select user_id
from table1
where group not in ('group1', 'group2')
CodePudding user response:
Consider below (BigQuery)
select *
from your_table
where true
qualify 2 = countif(`group` in ('group1', 'group2')) over(partition by user_id)
and 0 = countif(not `group` in ('group1', 'group2')) over(partition by user_id)
if applied to sample data in your question - output is