Suppose I have the following table:
Day | ID | Value |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
Now given a time window of 1 day, I want to create a time-series table that:
- The
Day
column granular unit is 1 day - Each
Day
row displays every ID in the table (like cross-join) - Moreover, if for that day, the ID is not recorded, then it uses the
Value
from the previous day. If it does not exist before this day, we can ignore it.
Let's say I want to view this time series from 2022-11-05
to 2022-11-08
, this is the desired output:
Day | ID | Value |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
2022-11-07 | 1 | B |
2022-11-08 | 0 | C |
2022-11-08 | 1 | B |
Explanation: ID=0
is not recorded on 11-06 so it uses the value from the previous day. ID=1
does not record new value on 11-07 so it uses the value from 11-06.
Note that the number of columns can be large, so if possible, I am looking for a solution that handles it too.
CodePudding user response:
Way One:
- first we start with some
data
- then we find
the_days
in the period we are interested in - then we find the
data_start
for each id - then we join those values together, and use
LAG
with theIGNORE NULLS OVER
clause to find the "prior values" if the current values in not present viaNVL
with data(Day, ID, Value) as (
select * from values
('2022-11-05'::date, 0, 'A'),
('2022-11-06'::date, 1, 'B'),
('2022-11-07'::date, 0, 'C')
), the_days as (
select
row_number() over (order by null)-1 as rn
,dateadd('day', rn, from_day) as day
from (
select
min(day) as from_day
,'2022-11-08' as to_day
,datediff('days', from_day, to_day) as days
from data
), table(generator(ROWCOUNT => 200))
qualify rn <= days
), data_starts as (
select
id,
min(day) as start_day
from data
group by 1
)
select
td.day,
ds.id,
nvl(d.value, lag(d.value) ignore nulls over (partition by ds.id order by td.day)) as value
from data_starts as ds
join the_days as td
on td.day >= ds.start_day
left join data as d
on ds.id = d.id and d.day = td.day
order by 1,2;
gives:
DAY | ID | VALUE |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
2022-11-07 | 1 | B |
2022-11-08 | 0 | C |
2022-11-08 | 1 | B |
Way Two:
with data(Day, ID, Value) as (
select * from values
('2022-11-05'::date, 0, 'A'),
('2022-11-06'::date, 1, 'B'),
('2022-11-07'::date, 0, 'C')
), the_days as (
select
dateadd('day', row_number() over (order by null)-1, '2022-11-05') as day
from table(generator(ROWCOUNT => 4))
)
select
td.day,
i.id,
nvl(d.value, lag(d.value) ignore nulls over (partition by i.id order by td.day)) as _value
from the_days as td
cross join (select distinct id from data) as i
left join data as d
on i.id = d.id and d.day = td.day
qualify _value is not null
order by 1,2;
this requires a unique name for the _values
output so it can be referenced in the qualify without needing to duplicate the code.