In BigQuery, I'm trying to find the total number of events before the second login time.
For different user ids, I have multiple events such as "scroll," "user engagement," "log in," "first_visit," "sign_up" etc. For simplicity, let's consider the above as all the events.
For instance, for user_id 2, I have the following information extracted from the raw data (this is a snapshot of the table).
User_id | Event_name | EventTime |
---|---|---|
2 | scroll | 2022-10-31 12:28:35 |
2 | sign_up | 2022-10-29 08:11:29 |
2 | login | 2022-11-01 16:46:34 |
2 | first_visit | 2022-10-30 10:45:22 |
2 | login | 2022-11-04 08:10:38 |
2 | scroll | 2022-11-05 11:18:35 |
2 | user engagement | 2022-11-06 08:45:17 |
2 | user engagement | 2022-11-07 05:27:32 |
First, I found the second login time for each user id.
WITH cte AS (
SELECT *, RANK() OVER (PARTITION BY User_id ORDER BY LoginTime) rnk
FROM MyData
)
SELECT User_id, LoginTime AS SecondLoginTime
FROM cte
WHERE rnk = 2
ORDER BY User_id;
User_id | SecondLoginTime |
---|---|
1 | 2022-11-07 09:52:27 |
2 | 2022-11-04 08:10:38 |
I wanted to write a query where I can compare this login time to each event time (for all user ids)and count the events before the SecondLoginTime.
For instance, for user_id 2, I want to make the following comparison:
User_id | Event_name | EventTime | SecondLoginTime |
---|---|---|---|
2 | scroll | 2022-10-31 12:28:35 | 2022-11-04 08:10:38 |
2 | sign_up | 2022-10-29 08:11:29 | 2022-11-04 08:10:38 |
2 | login | 2022-11-01 16:46:34 | 2022-11-04 08:10:38 |
2 | first_visit | 2022-10-30 10:45:22 | 2022-11-04 08:10:38 |
2 | login | 2022-11-04 08:10:38 | 2022-11-04 08:10:38 |
2 | scroll | 2022-11-05 11:18:35 | 2022-11-04 08:10:38 |
2 | user engagement | 2022-11-06 08:45:17 | 2022-11-04 08:10:38 |
2 | user engagement | 2022-11-07 05:27:32 | 2022-11-04 08:10:38 |
And find the followig result:
User_id | TotalEventsBeforeFirstVisit |
---|---|
2 | 4 |
I also want to apply this logic to all user ids.
Is there a way to do this? Please kindly ask me for clarification if there is anything misssing or if the problem is unclear. I'd really appreciate your help.
CodePudding user response:
This should work:
with LI2 AS (
SELECT User_Id, EventTime as LoginTime, ROW_NUMBER() OVER (PARTITION BY User_id ORDER BY EventTime) Seq
FROM MyTbl
where Event_Name='login'
)
SELECT
LI2.User_id
, LI2.LoginTime AS SecondLoginTime
, COUNT(OE.User_ID) as EventsBefore2ndLogin
FROM LI2
left join
MyTbl OE -- other events before 2nd login
on OE.User_Id=LI2.USer_Id
and OE.EventTime<LI2.LoginTime
WHERE LI2.Seq = 2
GROUP BY
LI2.User_id
, LI2.LoginTime
ORDER BY LI2.User_id
MyTbl is your table.
CodePudding user response:
You might consider below.
WITH cte AS (
SELECT *, NTH_VALUE(IF(Event_name = 'login', EventTime, NULL), 2 IGNORE NULLS) OVER w AS SecondLoginTime
FROM MyData
WINDOW w AS (PARTITION BY User_id ORDER BY EventTime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
)
SELECT User_id, COUNTIF(EventTime < SecondLoginTime) AS TotalEventsBeforeSecondLogin
FROM cte
GROUP BY 1;
Query results