I have a sample table below:
--------------------- --------- --------
| timestamp | id | balance|
|--------------------- --------- --------|
| 2022-08-01 00:00:00 | 1 | 0.01 |
| 2022-08-01 00:00:00 | 2 | 0 |
| 2022-08-02 00:00:00 | 2 | 0.005 |
| 2022-08-02 07:00:00 | 2 | 0.5 |
| 2022-08-02 07:00:00 | 1 | 0.15 |
| 2022-08-03 00:00:00 | 1 | 0.02 |
| 2021-08-03 01:00:00 | 1 | 0.03 |
--------------------- --------- --------
And I want the following output:
--------------------- --------- --------
| timestamp | id | balance|
|--------------------- --------- --------|
| 2022-08-01 00:00:00 | 1 | 0.01 |
| 2022-08-01 00:00:00 | 2 | 0 |
| 2022-08-02 07:00:00 | 2 | 0.5 |
| 2022-08-02 07:00:00 | 1 | 0.15 |
| 2021-08-03 01:00:00 | 1 | 0.03 |
--------------------- --------- --------
I want to get the latest balance for each id, per day. Ideally the sql would use a window function as the source data table is very large (~900GB) and I need something efficient that will scale.
I've spent a lot of time trying to figure this out and keep getting stuck, any help would be greatly appreciated! Thanks in advance
CodePudding user response:
Consider below simple approach
select * from your_table
qualify 1 = row_number() over id_day
window id_day as (partition by id, date(timestamp) order by timestamp desc)