Home > OS >  How to find consecutive rows between rows identified by a different column?
How to find consecutive rows between rows identified by a different column?

Time:10-30

Apologies if the title is unclear, the question is difficult to phrase.

I have a dataset that describes the positions of football players over time on a field during various plays. The table I'm trying to query has a frame_id column that is an incrementing counter, with there being 10 frames per second.

The table also has an event column, which describes an event that may have happened that frame, like ball_snap or kick_received. In the course of a play, some of these events can be guaranteed to happen in a specific order.

I want to find the frames that occur between two events.

If I want to get an ordered set of frame_ids, between two events defined in a different column, how would I go about that?

My first attempt was:

select game_id, play_id, min(frame_id) as snap_frame, max(frame_id) as tackle_frame
        from tracking
        where event = 'ball_snap'
        or event = 'tackle'
        group by game_id, play_id;

but this did not work. I seem to be getting some undefined behavior where this works for some records but not others

Edit to include some data

CSV: https://gist.github.com/bholzer/bae669a6e70f68807717f79a66af7a54

game_id play_id frame_id event
10 20 10
10 20 11 ball_snap
... ... ... ...
10 20 77
10 20 78 punt_received

I want a query that can get everything between frame 11 and 78 in this data

CodePudding user response:

There are many ways to solve it, one of them is using the EXISTS operator:

SELECT *
FROM tracking t
WHERE EXISTS (
  SELECT 1 FROM tracking te
  WHERE t.game_id = te.game_id 
    AND t.play_id = te.play_id
    AND te.event IN ('ball_snap', 'punt_received')
  HAVING t.frame_id > MIN(te.frame_id) AND t.frame_id < MAX(te.frame_id) 
)

db<>fiddle

CodePudding user response:

You have two different questions in your post :

If I want to get an ordered set of frame_ids, between two events defined in a different column, how would I go about that?

As far as I understand your tracking table structure, the same event name will appear in many rows of the table because it can occur several times during the game, so you can't use the column event in order to identify one specific event which occurs at one time, and as a consequence, you can't select all the events between two events characterized by their name only.

I want a query that can get everything between frame 11 and 78 in this data

If we assume that the frame_id values are ordered according to the timeline, then the query that achieves this is :

SELECT * FROM tracking WHERE frame_id BETWEEN 11 AND 78
  • Related