EDIT: Made a few clarifications in the post below
I am trying to work on a problem where I have a table that has missing timestamps. Let's say there is a table like this:
Timestamp | NumericField |
---|---|
2021-10-24 16:59:00.000 | 101 |
2021-10-24 16:57:00.000 | 101 |
I would like to try and do 2 things:
Populate a third record where the timestamp would be 2021-10-24 16:58:00.000.
Along with that, I would like to populate the NumericField field as 101 if the lead and lag records match, like in this example. The result would be:
Timestamp | NumericField |
---|---|
2021-10-24 16:59:00.000 | 101 |
2021-10-24 16:58:00.000 | 101 |
2021-10-24 16:57:00.000 | 101 |
If the lead and lag NumericField records do not match, then the NumericField that was generated would result in null. Example being:
Timestamp | NumericField |
---|---|
2021-10-24 16:59:00.000 | 101 |
2021-10-24 16:58:00.000 | NULL |
2021-10-24 16:57:00.000 | 100 |
The reason I am posting this question is because recursive CTE's isn't supported in Presto and I couldn't find any good resources to help me with this problem.
CodePudding user response:
I would try using lag
to find previous values and then sequence
to generate array of dates with interval '1' minute
step, unnest it and union the result with original table:
WITH dataset (Timestamp, NumericField) AS (
VALUES (timestamp '2021-10-24 16:59:00.000', 101),
(timestamp '2021-10-24 16:57:00.000', 101),
(timestamp '2021-10-24 16:55:00.000', 99)
)
SELECT date as Timestamp,
val as NumericField
FROM (
SELECT array_except(
sequence(prev_ts, Timestamp, interval '1' minute),
array [ prev_ts, timestamp ] -- exclude border values
) as dates,
case
NumericField
when prev_num then prev_num
end as val
FROM (
SELECT *,
lag(Timestamp) over(order by Timestamp) prev_ts,
lag(NumericField) over(order by Timestamp) prev_num
FROM dataset
)
) seq
CROSS JOIN UNNEST(dates) AS t (date)
UNION
SELECT *
FROM dataset
ORDER BY timestamp
Output:
Timestamp | NumericField |
---|---|
2021-10-24 16:55:00.000 | 99 |
2021-10-24 16:56:00.000 | |
2021-10-24 16:57:00.000 | 101 |
2021-10-24 16:58:00.000 | 101 |
2021-10-24 16:59:00.000 | 101 |