How to filter a table based on queried ids from another table in Snowflake-CodePudding

I'm trying to filter a table based on the queried result from another table.

create temporary table test_table (id number, col_a varchar);
insert into test_table values
    (1, 'a'), 
    (2, 'b'),
    (3, 'aa'),
    (4, 'a'),
    (6, 'bb'),
    (7, 'a'),
    (8, 'c');

create temporary table test_table_2 (id number, col varchar);
insert into test_table_2 values
    (1, 'aa'), 
    (2, 'bb'),
    (3, 'cc'),
    (4, 'dd'),
    (6, 'ee'),
    (7, 'ff'),
    (8, 'gg');

Here I want to find out all the id's in test_table with value "a" in col_a, and then I want to filter for rows with one of these id's in test_table_2. I've tried this below way, but got an error: SQL compilation error: syntax error line 6 at position 39 unexpected 'cte'.

with cte as 
(
    select id from test_table
    where col_a = 'a'
)
select * from test_table_2 where id in cte;

This approach below does work, but with large tables, it tends to be very slow. Is there a better more efficient way to scale to very large tables?

with cte as 
(
    select id from test_table
    where col_a = 'a'
)
select t2.* from test_table_2 t2 join cte on t2.id=cte.id;

CodePudding user response：

I would express this using exists logic:

SELECT id
FROM test_table_2 t2
WHERE EXISTS (
    SELECT 1
    FROM test_table t1
    WHERE t2.id = t1.id AND
          t1.col_a = 'a'
);

This has one advantage over a join in that Snowflake can stop scanning the test_table_2 table as soon as it finds a match.

CodePudding user response：

your first error can be fixed as below. Joins are usually better suited for lookups compared to exists or in clause if you have a large table.

with cte as 
(
    select id from test_table
    where col_a = 'a'
)
select * from test_table_2 where id in (select distinct id from cte);