Home > Net >  finding the first row that is greater than the time offset for each row in sql
finding the first row that is greater than the time offset for each row in sql

Time:12-13

I have a table with columns timestamp, price. I want to find whether the price went up or down after 1 second. The timestamp is in snowflake's TIMESTAMP_TZ format, i.e., 2022-12-12 9:30:00.0 01:00.

If I were to do this using a for loop, it would look something like this:

# pseudocode in python
price_diff = {}
for i in range(len(table)):
  current_row = table[i]
  for j in range(i, -1, -1):
     prev_row = table[j]
     if current_row.datetime - prev_row.datetime >= 1 second:
        price_diff[current_row.index] = current_row.price - prev_row.price
        break

Here's what I've tried:

SELECT table1.datetime, table1.price, 
(
  SELECT table2.price
  FROM mytable table2
  WHERE
  timediff(second, table1.datetime, table2.datetime) > 1 
  ORDER BY
  table2.datetime
  DESC
  LIMIT 1
) AS price_tm1,
(table1.price - price_tm1) AS price_diff
FROM
mytable table1

This gives me an error saying: SQL compilation error: Unsupported subquery type cannot be evaluated.

Any suggestions would be much appreciated!

CodePudding user response:

One way it to use lead to get the "next" but this will only find when the inter message values is over 1 second, thus with this data:

with mytable(stock, datetime, price) as (
    select * from values
    (1, '2022-12-13 12:31:45.00'::timestamp, 10.0),
    (1, '2022-12-13 12:31:45.01'::timestamp, 10.1),
    (1, '2022-12-13 12:31:45.02'::timestamp, 10.2),
    (1, '2022-12-13 12:31:48.00'::timestamp, 11.0)  
)
select *
    ,lead(datetime) over (partition by stock order by datetime) as lead_datetime
    ,lead(price) over (partition by stock order by datetime) as lead_price
    ,timediff(second, table1.datetime, lead_datetime) as gap
    ,iff(gap > 1, lead_price, null) as next_second_plus_price
from mytable as table1
order by datetime
;
STOCK DATETIME PRICE LEAD_DATETIME LEAD_PRICE GAP NEXT_SECOND_PLUS_PRICE
1 2022-12-13 12:31:45.000 10 2022-12-13 12:31:45.010 10.1 0 null
1 2022-12-13 12:31:45.010 10.1 2022-12-13 12:31:45.020 10.2 0 null
1 2022-12-13 12:31:45.020 10.2 2022-12-13 12:31:48.000 11 3 11
1 2022-12-13 12:31:48.000 11 null null null null

but if you really want the next per row, then you want to use a join, and the filter away the unwanted rows with a qualify:

with mytable(stock, datetime, price) as (
    select * from values
    (1, '2022-12-13 12:31:45.00'::timestamp, 10.0),
    (1, '2022-12-13 12:31:45.01'::timestamp, 10.1),
    (1, '2022-12-13 12:31:45.02'::timestamp, 10.2),
    (1, '2022-12-13 12:31:46.00'::timestamp, 11.0),  
    (1, '2022-12-13 12:31:46.01'::timestamp, 11.1),  
    (1, '2022-12-13 12:31:46.02'::timestamp, 11.2),  
    (1, '2022-12-13 12:31:46.03'::timestamp, 11.3)  
)
select t1.*
    ,t2.datetime as next_datetime
    ,t2.price as next_price
from mytable as t1
left join mytable as t2
    on t1.stock = t2.stock and dateadd(second, 1, t1.datetime) < t2.datetime
qualify row_number() over (partition by t1.stock, t1.datetime order by t2.datetime) = 1
order by 1,2;   
STOCK DATETIME PRICE NEXT_DATETIME NEXT_PRICE
1 2022-12-13 12:31:45.000 10 2022-12-13 12:31:46.010 11.1
1 2022-12-13 12:31:45.010 10.1 2022-12-13 12:31:46.020 11.2
1 2022-12-13 12:31:45.020 10.2 2022-12-13 12:31:46.030 11.3
1 2022-12-13 12:31:46.000 11 null null
1 2022-12-13 12:31:46.010 11.1 null null
1 2022-12-13 12:31:46.020 11.2 null null
1 2022-12-13 12:31:46.030 11.3 null null

CodePudding user response:

You can use the LAG function to get the value of datetime and price of the previous row (ordering the data on datetime) in the current row. It's then pretty straightforward to calculate the time and price difference.

You haven't specified what price_diff should be if the time difference is <= 1 second so I have set it to NULL in this example.

WITH base AS (
    SELECT 
          datetime
        , price
        , lag(datetime, 1) over (order by datetime) as prev_datetime
        , lag(price, 1) over (order by datetime) as prev_price
    FROM tableA
)
SELECT 
      datetime
    , price
    , prev_datetime
    , prev_price
    , CASE 
        WHEN datediff(second, prev_datetime, datetime) > 1
            THEN price - prev_price
        ELSE NULL 
      END AS price_diff
FROM base 
ORDER BY datetime
  • Related