I have a table historical_data
ID | Date | column_a | column_b |
---|---|---|---|
1 | 2011-10-01 | a | a1 |
1 | 2011-11-01 | w | w1 |
1 | 2011-09-01 | a | a1 |
2 | 2011-01-12 | q | q1 |
2 | 2011-02-01 | d | d1 |
3 | 2011-11-01 | s | s1 |
I need to retrieve the whole history of an id based on the date condition on any 1 row related to that ID.
date>='2011-11-01' should get me
ID | Date | column_a | column_b |
---|---|---|---|
1 | 2011-10-01 | a | a1 |
1 | 2011-11-01 | w | w1 |
1 | 2011-09-01 | a | a1 |
3 | 2011-11-01 | s | s1 |
I am aware you can get this by using a CTE or a subquery like
with selected_id as (
select id from historical_data where date>='2011-11-01'
)
select hd.* from historical_data hd
inner join selected_id si on hd.id = si.id
or
select * from historical_data
where id in (select id from historical_data where date>='2011-11-01')
In both these methods I have to query/scan the table ``historical_data``` twice. I have indexes on both id and date so it's not a problem right now, but as the table grows this may cause issues.
The table above is a sample table, the table I have is about to touch 1TB in size with upwards of 600M rows.
Is there any way to achieve this by only querying the table once? (I am using Snowflake)
CodePudding user response:
Using QUALIFY
:
SELECT *
FROM historical_data
QUALIFY MAX(date) OVER(PARTITION BY id) >= '2011-11-01'::DATE;