Home > Net >  Query to find and subtract two timestamps associated with the same identifier
Query to find and subtract two timestamps associated with the same identifier

Time:01-03

I'm very new to BigQuery and not terribly familiar with SQL. I have a table of data that looks like this, where MyDate is a Timestamp object:

Row MyDate StateTransition MyIdentifier
1 2022-09-23 00:08:00 UTC Start 6371
2 2022-10-10 01:17:14 UTC Finished 6371
3 2022-09-26 04:51:40 UTC Start 7768
4 2022-10-05 03:44:32 UTC Finished 7768

etc.

My query looks something like

SELECT *
FROM <my-data-source>
WHERE (StateTransition="Start" OR StateTransition="Finished")
ORDER BY MyIdentifier, MyDate

What I'm trying to do is calculate the elapsed time (in days) between the Start and Finished timestamps associated with each MyIdentifier, and to have that displayed in another column. It could look like:

Row MyDate StateTransition MyIdentifier ElapsedTime
1 2022-09-23 00:08:00 UTC Start 6371
2 2022-10-10 01:17:14 UTC Finished 6371 0.33
3 2022-09-26 04:51:40 UTC Start 7768
4 2022-10-05 03:44:32 UTC Finished 7768 0.04

Alternatively, it could even be flattened a little to something like:

Row StartTransition FinishedTransition MyIdentifier ElapsedTime
1 2022-09-23 00:08:00 UTC 2022-10-10 01:17:14 UTC 6371 0.33
2 2022-09-26 04:51:40 UTC 2022-10-05 03:44:32 UTC 7768 0.04

I've tried looking through the BigQuery docs and Stack Overflow but haven't found anything that addresses this use case of selecting items from multiple rows with a common identifier and then performing an operation on them. It seems like subtracting the two timestamps would be done with the enter image description here

But for the intermediate result, we need a window function.

SELECT *,
       IF(
         StateTransition = 'Finished',
         TIMESTAMP_DIFF(MyDate, FIRST_VALUE(IF(StateTransition = 'Start', MyDate, NULL) IGNORE NULLS) OVER w, DAY),
         NULL
       ) AS ElapsedTime
  FROM sample_table
WINDOW w AS (PARTITION BY MyIdentifier ORDER BY MyDate);

enter image description here

and if you want flattend result from the above result (using a window function), the query will looks like below which shows same result as the first query using an aggregation.

SELECT MyIdentifier,
       FIRST_VALUE(IF(StateTransition = 'Start', MyDate, NULL) IGNORE NULLS) OVER w AS StartTransition,
       MyDate AS FinishedTransition,
       IF(
         StateTransition = 'Finished',
         TIMESTAMP_DIFF(MyDate, FIRST_VALUE(IF(StateTransition = 'Start', MyDate, NULL) IGNORE NULLS) OVER w, DAY),
         NULL
       ) AS ElapsedTime
  FROM sample_table
QUALIFY StateTransition = 'Finished'
WINDOW w AS (PARTITION BY MyIdentifier ORDER BY MyDate);

CodePudding user response:

I think that for each MyIdentifier you should have only one start and one finish, so you can simply split and join:

;WITH 
ts AS ( SELECT * FROM <my-data-source> WHERE StateTransition = 'Start'),
tf AS ( SELECT * FROM <my-data-source> WHERE StateTransition = 'Finished')
SELECT 
    ts.MyIdentifier, 
    ts.MyDate StartTransition, 
    tf.MyDate FinishedTransition, 
    TIMESTAMP_DIFF(ts.MyDate, tf.MyDate, DAY) ElapsedTime
FROM ts
LEFT JOIN tf on ts.MyIdentifier = tf.MyIdentifier
  • Related