SQL - select column value of previous matching row-CodePudding

I have a dataset like the above. Basically it's like a website session which I have a start end activity, and in between there could be other unrelated records.

I need to pull the records based on:

I need to report all the "end" activity from the table
In each "end" record, I need to find its nearest previous "start" row's "Content_ID" value. The record must also match the "Profile_ID" as well
The result I expected is as follows:

May I ask how could I construct the SQL to do this?

Thanks so much!

CodePudding user response：

You might consider below in BigQuery.

SELECT * EXCEPT(Content_ID),
       LAST_VALUE(IF(Activity = 'start', Content_ID, NULL) IGNORE NULLS) OVER w AS Content_ID
  FROM sample_table
QUALIFY Activity = 'end'
WINDOW w AS (PARTITION BY Profile_ID ORDER BY Created_Time);

Query results

CodePudding user response：

We can address this as a gaps-and-island problem, where an island begins every time an activity starts for a given profile_id: we can define the groups with a window sum.

Then, for every end activity, we need to report the content id of the beginning of the island.

select profile_id, created_time,
    max(case when activity = 'start' then content_id end) 
        over(partition by profile_id, grp) content_id
from (
    select t.*,
        sum(case when activity = 'start' then 1 else 0 end) 
            over(partition by profile_id order by created_time) grp
    from mytable t
) t
where activity = 'end'

Depending on the database you actually use (BQ or MySQL), some syntax shortcuts exists to shorten the conditional expressions. In BQ for example, we would phrase the window sum with countif:

countif(activity = 'start') 
    over(partition by profile_id order by created_time) grp