I have a table with columns timestamp
, price
. I want to find whether the price went up or down after 1 second. The timestamp is in snowflake's TIMESTAMP_TZ format, i.e., 2022-12-12 9:30:00.0 01:00
.
If I were to do this using a for loop, it would look something like this:
# pseudocode in python
price_diff = {}
for i in range(len(table)):
current_row = table[i]
for j in range(i, -1, -1):
prev_row = table[j]
if current_row.datetime - prev_row.datetime >= 1 second:
price_diff[current_row.index] = current_row.price - prev_row.price
break
Here's what I've tried:
SELECT table1.datetime, table1.price,
(
SELECT table2.price
FROM mytable table2
WHERE
timediff(second, table1.datetime, table2.datetime) > 1
ORDER BY
table2.datetime
DESC
LIMIT 1
) AS price_tm1,
(table1.price - price_tm1) AS price_diff
FROM
mytable table1
This gives me an error saying:
SQL compilation error: Unsupported subquery type cannot be evaluated
.
Any suggestions would be much appreciated!
CodePudding user response:
One way it to use lead to get the "next" but this will only find when the inter message values is over 1 second, thus with this data:
with mytable(stock, datetime, price) as (
select * from values
(1, '2022-12-13 12:31:45.00'::timestamp, 10.0),
(1, '2022-12-13 12:31:45.01'::timestamp, 10.1),
(1, '2022-12-13 12:31:45.02'::timestamp, 10.2),
(1, '2022-12-13 12:31:48.00'::timestamp, 11.0)
)
select *
,lead(datetime) over (partition by stock order by datetime) as lead_datetime
,lead(price) over (partition by stock order by datetime) as lead_price
,timediff(second, table1.datetime, lead_datetime) as gap
,iff(gap > 1, lead_price, null) as next_second_plus_price
from mytable as table1
order by datetime
;
STOCK | DATETIME | PRICE | LEAD_DATETIME | LEAD_PRICE | GAP | NEXT_SECOND_PLUS_PRICE |
---|---|---|---|---|---|---|
1 | 2022-12-13 12:31:45.000 | 10 | 2022-12-13 12:31:45.010 | 10.1 | 0 | null |
1 | 2022-12-13 12:31:45.010 | 10.1 | 2022-12-13 12:31:45.020 | 10.2 | 0 | null |
1 | 2022-12-13 12:31:45.020 | 10.2 | 2022-12-13 12:31:48.000 | 11 | 3 | 11 |
1 | 2022-12-13 12:31:48.000 | 11 | null | null | null | null |
but if you really want the next per row, then you want to use a join, and the filter away the unwanted rows with a qualify:
with mytable(stock, datetime, price) as (
select * from values
(1, '2022-12-13 12:31:45.00'::timestamp, 10.0),
(1, '2022-12-13 12:31:45.01'::timestamp, 10.1),
(1, '2022-12-13 12:31:45.02'::timestamp, 10.2),
(1, '2022-12-13 12:31:46.00'::timestamp, 11.0),
(1, '2022-12-13 12:31:46.01'::timestamp, 11.1),
(1, '2022-12-13 12:31:46.02'::timestamp, 11.2),
(1, '2022-12-13 12:31:46.03'::timestamp, 11.3)
)
select t1.*
,t2.datetime as next_datetime
,t2.price as next_price
from mytable as t1
left join mytable as t2
on t1.stock = t2.stock and dateadd(second, 1, t1.datetime) < t2.datetime
qualify row_number() over (partition by t1.stock, t1.datetime order by t2.datetime) = 1
order by 1,2;
STOCK | DATETIME | PRICE | NEXT_DATETIME | NEXT_PRICE |
---|---|---|---|---|
1 | 2022-12-13 12:31:45.000 | 10 | 2022-12-13 12:31:46.010 | 11.1 |
1 | 2022-12-13 12:31:45.010 | 10.1 | 2022-12-13 12:31:46.020 | 11.2 |
1 | 2022-12-13 12:31:45.020 | 10.2 | 2022-12-13 12:31:46.030 | 11.3 |
1 | 2022-12-13 12:31:46.000 | 11 | null | null |
1 | 2022-12-13 12:31:46.010 | 11.1 | null | null |
1 | 2022-12-13 12:31:46.020 | 11.2 | null | null |
1 | 2022-12-13 12:31:46.030 | 11.3 | null | null |
CodePudding user response:
You can use the LAG
function to get the value of datetime and price of the previous row (ordering the data on datetime) in the current row. It's then pretty straightforward to calculate the time and price difference.
You haven't specified what price_diff
should be if the time difference is <= 1 second so I have set it to NULL in this example.
WITH base AS (
SELECT
datetime
, price
, lag(datetime, 1) over (order by datetime) as prev_datetime
, lag(price, 1) over (order by datetime) as prev_price
FROM tableA
)
SELECT
datetime
, price
, prev_datetime
, prev_price
, CASE
WHEN datediff(second, prev_datetime, datetime) > 1
THEN price - prev_price
ELSE NULL
END AS price_diff
FROM base
ORDER BY datetime