Home > Software design >  Iterate over data set and insert sequence ids for rows
Iterate over data set and insert sequence ids for rows

Time:02-17

I have a data set with more than 2 million rows in a Oracle SQL table and want to do some association analysis on this. To apply a sequence mining algorithm to this data, I need a column named 'sequenceId' and a column 'eventId'.

The table structure looks like this:

  • timestamp
  • product_id
  • user_id

Now I need an id that increments every time the user_id changes. How can I do that in Oracle SQL? I tried it in R but it takes more than 12 hours there...


Sample data:

timestamp product_id user_id
2019-10-01 12:12:24 UTC 3806 535447446
2019-10-01 19:51:55 UTC 3762 535447446
2019-10-02 18:09:34 UTC 3806 552286734
2019-10-02 17:54:01 UTC 3928 493964166

Expected result:

timestamp product_id user_id sequence id
2019-10-01 12:12:24 UTC 3806 535447446 1
2019-10-01 19:51:55 UTC 3762 535447446 1
2019-10-02 18:09:34 UTC 3806 552286734 2
2019-10-02 17:54:01 UTC 3928 493964166 3

The id should increment when the user_id changes

CodePudding user response:

WITH 
source_data AS (
    SELECT '2019-10-01 12:12:24' AS timestamp, 3806 AS product_id, 535447446 AS user_id FROM DUAL UNION ALL
    SELECT '2019-10-01 19:51:55', 3762, 535447446 FROM DUAL UNION ALL
    SELECT '2019-10-02 18:09:34', 3806, 552286734 FROM DUAL UNION ALL
    SELECT '2019-10-02 17:54:01', 3928, 493964166 FROM DUAL 
),
cte AS (
    SELECT timestamp,
           product_id,
           user_id,
           CASE WHEN user_id = LAG(user_id) OVER (ORDER BY timestamp) 
                THEN 0
                ELSE 1
                END new_user
    FROM source_data
)
SELECT timestamp,
       product_id,
       user_id,
       SUM(new_user) OVER (ORDER BY timestamp) sequence_id
FROM cte;
TIMESTAMP PRODUCT_ID USER_ID SEQUENCE_ID
2019-10-01 12:12:24 3806 535447446 1
2019-10-01 19:51:55 3762 535447446 1
2019-10-02 17:54:01 3928 493964166 2
2019-10-02 18:09:34 3806 552286734 3
  • Related