I can't seem to get LAG() to work with strings, it seems like it wants to convert them to numeric values.
The first few rows of the CTE "event_stream_time_window" look like this (it is ordered by id and timestamp):
id | os | event | category | timestamp |
---|---|---|---|---|
1 | ios | launch | start | 2022-03-01 13:14:45 |
1 | ios | landing page - view | landing page | 2022-03-01 13:14:46 |
1 | ios | message - click | messages | 2022-03-01 13:14:50 |
I want to create a column named "lag_cat" that contains the value of the "category" column from the previous row, or if there is now row just contains the value of the "category" column of the current row. For example, I want the first few rows of the output to look something like this:
id | os | event | category | timestamp | lag_cat |
---|---|---|---|---|---|
1 | ios | launch | start | 2022-03-01 13:14:45 | start |
1 | ios | landing page - view | landing page | 2022-03-01 13:14:46 | landing page |
1 | ios | message - click | messages | 2022-03-01 13:14:50 | messages |
There are multiple IDs in the table and I want this to be done within ID group, so I am using the following code:
SELECT
id,
os,
event,
category,
timestamp,
LAG(category, 1, 0) OVER (PARTITION BY id ORDER BY timestamp) AS lag_cat
FROM event_stream_time_window
I get the following error: "Numeric value 'Start' is not recognized"
So it seems like LAG() is trying to convert "category" to a numeric type. Is there any way around this?
Thanks!
CodePudding user response:
The issue here is the default value for LAG
which is equal 0
(data type numeric). Casting string literals "start"
/"messages"
/... to nummeric is not possible as they are not valid numbers, thus the error message.
If zero character is required as output then it should be provided as '0'
:
SELECT
id,
os,
event,
category,
timestamp,
LAG(category, 1, '0') OVER (PARTITION BY id ORDER BY timestamp) AS lag_cat
FROM event_stream_time_window;
The mechanism that causes it is called "Implicit data type coercion"
CodePudding user response:
If you want the current rows CATEGORY
if there is not prior, the defualt value should be CATEGORY
not 0
SELECT
id,
os,
event,
category,
timestamp,
lag(category,1,category) over (PARTITION BY id ORDER BY timestamp) as lag_cat
FROM VALUES
(1, 'ios', 'launch','start','2022-03-01 13:14:45'),
(1, 'ios', 'landing page - view','landing page','2022-03-01 13:14:46'),
(1, 'ios', 'message - click','messages','2022-03-01 13:14:50')
t(id, os, event, category, timestamp)
ID | OS | EVENT | CATEGORY | TIMESTAMP | LAG_CAT |
---|---|---|---|---|---|
1 | ios | launch | start | 2022-03-01 13:14:45 | start |
1 | ios | landing page - view | landing page | 2022-03-01 13:14:46 | start |
1 | ios | message - click | messages | 2022-03-01 13:14:50 | landing page |