Apologies if the title is unclear, the question is difficult to phrase.
I have a dataset that describes the positions of football players over time on a field during various plays. The table I'm trying to query has a frame_id
column that is an incrementing counter, with there being 10 frames per second.
The table also has an event
column, which describes an event that may have happened that frame, like ball_snap
or kick_received
. In the course of a play, some of these events can be guaranteed to happen in a specific order.
I want to find the frames that occur between two events.
If I want to get an ordered set of frame_id
s, between two events defined in a different column, how would I go about that?
My first attempt was:
select game_id, play_id, min(frame_id) as snap_frame, max(frame_id) as tackle_frame
from tracking
where event = 'ball_snap'
or event = 'tackle'
group by game_id, play_id;
but this did not work. I seem to be getting some undefined behavior where this works for some records but not others
Edit to include some data
CSV: https://gist.github.com/bholzer/bae669a6e70f68807717f79a66af7a54
game_id | play_id | frame_id | event |
---|---|---|---|
10 | 20 | 10 | |
10 | 20 | 11 | ball_snap |
... | ... | ... | ... |
10 | 20 | 77 | |
10 | 20 | 78 | punt_received |
I want a query that can get everything between frame 11 and 78 in this data
CodePudding user response:
There are many ways to solve it, one of them is using the EXISTS operator:
SELECT *
FROM tracking t
WHERE EXISTS (
SELECT 1 FROM tracking te
WHERE t.game_id = te.game_id
AND t.play_id = te.play_id
AND te.event IN ('ball_snap', 'punt_received')
HAVING t.frame_id > MIN(te.frame_id) AND t.frame_id < MAX(te.frame_id)
)
CodePudding user response:
You have two different questions in your post :
If I want to get an ordered set of frame_ids, between two events defined in a different column, how would I go about that?
As far as I understand your tracking
table structure, the same event name will appear in many rows of the table because it can occur several times during the game, so you can't use the column event
in order to identify one specific event which occurs at one time, and as a consequence, you can't select all the events between two events characterized by their name only.
I want a query that can get everything between frame 11 and 78 in this data
If we assume that the frame_id values are ordered according to the timeline, then the query that achieves this is :
SELECT * FROM tracking WHERE frame_id BETWEEN 11 AND 78