Can someone help me out with this sql query on postgres which I have to write but I just can't come up with, I have tried my best to simplify the problem from 1 million records and more constraints to this, I know this looks easy, but I am still unable to resolve this somehow :-
Table_name = t
Column_1_name = id
Column_2_name = st
Column_1_elements = [1,1,1,1,2,2,2,3,3]
Column_2_elements = [a,b,c,d,a,c,d,b,d]
Now I want to print to those distinct ids from id where they do not have their corresponding st equals to 'b' or 'a'.
For example, for the above example, the ouput should be [2,3] as 2 does not have corresponding 'b' and 3 does not have 'a'. [even though 3 does not have c also, but we are not concerned about 'c']. id=1 is not returned in solution as it has a relation with both 'a' and 'b'.
Let me know if you need more clarity. Thanks in advance for helping.
edit1:- The number of elements for id = 1,2,3 could be anything. I just want those ids where there corresponding st does not "contain" 'a' or 'b'. if there is an id=4 which has just one st which is 'r', and there is an id=5 which contains 'a','b','c','d','e','f','k','z'. Then we want id=4 in the output as well as it does not contain 'a' or 'b'..
CodePudding user response:
You might need to correct the syntax a little bit based on you SQL engine but this one is a working solution in Google BigQuery -
with temp as (
select 1 as id, 'a' as st union all
select 1 as id, 'b' as st union all
select 1 as id, 'c' as st union all
select 1 as id, 'd' as st union all
select 2 as id, 'a' as st union all
select 2 as id, 'c' as st union all
select 2 as id, 'd' as st union all
select 3 as id, 'b' as st union all
select 3 as id, 'd' as st union all
select 4 as id, 'e' as st union all
select 5 as id, 'g' as st union all
select 5 as id, 'h' as st
)
-- add 2 columns for is_a and is_b flags
, temp2 as (
select *
, case when st = 'a' then 1 else 0 end is_a
,case when st = 'b' then 1 else 0 end as is_b
from temp
)
-- IDs that have both the flags as 1 should be filtered out (like ID = 1)
select id
from temp2
group by 1
having max(is_a) max(is_b) < 2
This solution takes care of the problem you mentioned with ID 4 . Let me know if this works for you.
CodePudding user response:
See if this works:
create table t (id integer, st varchar);
insert into t values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'c'), (2, 'd'), (3, 'b'), (3, 'd'), (4, 'r');
insert into t values (5, 'a'), (5, 'b'), (5, 'c'), (5, 'd'), (5, 'e'), (5, 'f'), (5, 'k'), (5, 'z');
select id, array['a', 'b'] <@ array_agg(st)::text[] as tf from t group by id;
id | tf
---- ----
3 | f
5 | t
4 | f
2 | f
1 | t
select * from (select id, array['a', 'b'] <@ array_agg(st)::text[] as tf from t group by id) as agg where agg.tf = 'f';
id | tf
---- ----
3 | f
4 | f
2 | f
In the first select query the array_agg(st)
aggregates all the st
values for an id
via the group by id
. array['a', 'b'] <@ array_agg(st)::text[]
then asks if the a
and b
are both in the array_agg
.
The query is then turned into a sub-query where the outer query selects those rows that where 'f'
(false), in other words did not have both a
and b
in the aggregated id
values.