I have table in my Oracle database (v12.2.0.2.1) that tracks a users attempt to register an event and the timestamp of that attempt. The question I am trying to answer is the time difference between each users attempt.
The original table looks something like this:
USER_ACQUISITION_ATTEMPT_ID | USER_ID | RECORDING_DATE_ATTEMPT |
---|---|---|
1 | 189590 | 2022-OCT-12 14:57:54 |
2 | 189590 | 2022-OCT-12 14:58:15 |
3 | 189590 | 2022-OCT-12 15:01:50 |
4 | 189590 | 2022-OCT-12 15:02:11 |
5 | 189590 | 2022-OCT-12 15:02:40 |
6 | 189590 | 2022-OCT-12 15:03:51 |
7 | 22809 | 2022-OCT-12 15:04:32 |
8 | 189590 | 2022-OCT-12 15:09:05 |
9 | 189590 | 2022-OCT-12 15:10:16 |
10 | 189590 | 2022-OCT-12 15:10:24 |
11 | 189590 | 2022-OCT-12 15:10:42 |
12 | 189590 | 2022-OCT-12 15:13:00 |
13 | 178275 | 2022-OCT-12 15:45:22 |
14 | 180336 | 2022-OCT-12 15:53:25 |
15 | 180336 | 2022-OCT-12 15:53:50 |
Where each attempt is uniquely registered in order by the USER_ACQUISITION_ATTEMPT_ID. The USER_ID is the ID of the user that makes the attempt and the RECORDING_DATE_ATTEMPT is the timestamp.
I've created the following query that will show the time different between each attempt throughout the whole table using the LAG function to compare the dates from one row to the next and then just some math to calculate the time difference.
WITH dat AS
(
SELECT
a.USER_ACQUISITION_ATTEMPT_ID,
a.USER_ID,
a.RECORDING_DATE_ATTEMP DAT1,
a.RECORDING_DATE_ATTEMP DAT2
FROM
USER_ACQUISITION_ATTEMPT a
UNION ALL
SELECT
c.USER_ACQUISITION_ATTEMPT_ID,
c.USER_ID,
c.RECORDING_DATE_ATTEMP DAT1,
c.RECORDING_DATE_ATTEMP DAT2
FROM
USER_ACQUISITION_ATTEMPT c
)
SELECT
USER_ACQUISITION_ATTEMPT_ID,
USER_ID,
DAT1,
LAG(DAT1,1) OVER (ORDER BY DAT1) LAG,
TO_CHAR(DATE '1900-01-01' ABS (LAG(DAT1,1) OVER (ORDER BY DAT1) - DAT1), 'HH24:MI:SS') as TIME_DIFF
FROM dat
group by
USER_ACQUISITION_ATTEMPT_ID,
USER_ID,
DAT1
Which yields something like this:
USER_ACQUISITION_ATTEMPT_ID | USER_ID | DAT1 | LAG | TIME_DIFF |
---|---|---|---|---|
1 | 189590 | 2022-OCT-12 14:57:54 | NULL | NULL |
2 | 189590 | 2022-OCT-12 14:58:15 | 2022-OCT-12 14:57:54 | 00:00:21 |
3 | 189590 | 2022-OCT-12 15:01:50 | 2022-OCT-12 14:58:15 | 00:03:35 |
4 | 189590 | 2022-OCT-12 15:02:11 | 2022-OCT-12 15:01:50 | 00:00:21 |
5 | 189590 | 2022-OCT-12 15:02:40 | 2022-OCT-12 15:02:11 | 00:00:29 |
6 | 189590 | 2022-OCT-12 15:03:51 | 2022-OCT-12 15:02:40 | 00:01:11 |
7 | 22809 | 2022-OCT-12 15:04:32 | 2022-OCT-12 15:03:51 | 00:00:41 |
8 | 189590 | 2022-OCT-12 15:09:05 | 2022-OCT-12 15:04:32 | 00:04:33 |
9 | 189590 | 2022-OCT-12 15:10:16 | 2022-OCT-12 15:09:05 | 00:01:11 |
10 | 189590 | 2022-OCT-12 15:10:24 | 2022-OCT-12 15:10:16 | 00:00:08 |
11 | 189590 | 2022-OCT-12 15:10:42 | 2022-OCT-12 15:10:24 | 00:00:18 |
12 | 189590 | 2022-OCT-12 15:13:00 | 2022-OCT-12 15:10:42 | 00:02:18 |
13 | 178275 | 2022-OCT-12 15:45:22 | 2022-OCT-12 15:13:00 | 00:32:22 |
14 | 180336 | 2022-OCT-12 15:53:25 | 2022-OCT-12 15:45:22 | 00:08:03 |
15 | 180336 | 2022-OCT-12 15:53:50 | 2022-OCT-12 15:53:25 | 00:00:25 |
A mentioned, I want to look at each user individually. The above query doesn't do that. For example, in the result table you will see the first row has the very first attempt and its timestamp. Under the LAG and TIME_DIFF column, there are nulls because no other attempt came before it. Each subsequent row grabs the last DAT1 and places it in the next row under the LAG column. That works fine until the next user tries an attempt. Where you see the USER_ID change from 189590 on USER_ACQUISITION_ATTEMPT_ID #6 to USER_ID 22809 on USER_ACQUISITION_ATTEMPT_ID #7, the last time stamp from the previous user is used against the first timestamp of the next user which is not what I want. The first attempt for any given user should have NULL for both the LAG and TIME_DIFF columns so that the query can sort of reset the calculation of the time differences for each user.
I hope I have explained the issue clearly, however if clarification is needed, please add some comments.
CodePudding user response:
You don't need the UNION ALL
in the common table expression. It's just duplicating the data, which you then use GROUP BY
to de-duplicate it.
Removing the UNION ALL
means that you can remove the GROUP BY
.
Then just use PARTITION BY
in the analytic functions.
SELECT
USER_ACQUISITION_ATTEMPT_ID,
USER_ID,
RECORDING_DATE_ATTEMP,
LAG(RECORDING_DATE_ATTEMP)
OVER (
PARTITION BY USER_ID
ORDER BY RECORDING_DATE_ATTEMP
)
AS LAG_RECORDING_DATE_ATTEMP
FROM
USER_ACQUISITION_ATTEMPT