Let's say your web application tracks the activities of its users using a tracking system. While a user hasn't logged in or signed up, all the user's actions are tracked using anonymous_id and the user_id is null, and afterward, they are tracked using the same anonymous_id and user_id. It is known that after a user logs in or signs up, the user_id is no longer null.
I am trying to write a query to find the last event where the user was tracked only by anonymous_id (column last_null) and the first event that was tracked by user_id (column first_notnull). The resulting table should be sorted by anonym_id.
For given table tracks;
received_at | event_name | anonymous_id | user_id |
---|---|---|---|
2016-01-01 12:13:12 | buttonClicked | 1 | NULL |
2016-01-02 12:14:15 | pageReloaded | 3 | NULL |
2016-02-02 13:15:13 | pageRendered | 2 | NULL |
2016-02-03 13:15:23 | commentWritten | 3 | NULL |
2016-03-03 14:15:15 | avatarUpdated | 2 | 2 |
2016-03-04 14:15:24 | statusUpdated | 1 | 1 |
Output should be;
anonym_id | last_null | first_notnull |
---|---|---|
1 | buttonClicked | statusUpdated |
2 | pageRendered | avatarUpdated |
3 | commentWritten | NULL |
Here is what I came up with, but I am not sure if there is a better solution. Can you please help me?
SELECT
distinct anonymous_id as anonym_id
,CASE
WHEN user_id is not null THEN prev_event
WHEN user_id is null THEN event_name
END AS last_null
,CASE
WHEN user_id is not null THEN event_name
WHEN user_id is null THEN next_event
END AS first_notnull
FROM(
SELECT
*
,max(rnk)
FROM(
SELECT
*
,lag(event_name) OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as prev_event
,lead(event_name) OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as next_event
,RANK() OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as rnk
FROM tracks
) as temp
GROUP BY anonymous_id
)as temp2
WHERE prev_event is not NULL
CodePudding user response:
I can think of a few ways to achieve this, this is one of it:
SELECT anonymous_id,
SUBSTRING_INDEX(
GROUP_CONCAT(
CASE WHEN user_id IS NULL
THEN event_name END
ORDER BY received_at DESC),',',1) AS last_null,
SUBSTRING_INDEX(
GROUP_CONCAT(
CASE WHEN user_id IS NOT NULL
THEN event_name END
ORDER BY received_at DESC),',',1) AS first_notnull
FROM tracks
GROUP BY anonymous_id;
Using CASE
expression then wrap it in GROUP_CONCAT()
with addition of ORDER BY received_at DESC
. Then use SUBSTRING_INDEX()
to get the value from SUBSTRING_INDEX()
. See this fiddle to understand more
This is the quickest I can think of right now but I'll try a few more ways (probably shorter and more efficient query) and will update the answer if necessary.