Home > Blockchain >  The LAG() function doesn't seem to work with strings (varchar type columns)
The LAG() function doesn't seem to work with strings (varchar type columns)

Time:05-05

I can't seem to get LAG() to work with strings, it seems like it wants to convert them to numeric values.

The first few rows of the CTE "event_stream_time_window" look like this (it is ordered by id and timestamp):

id os event category timestamp
1 ios launch start 2022-03-01 13:14:45
1 ios landing page - view landing page 2022-03-01 13:14:46
1 ios message - click messages 2022-03-01 13:14:50

I want to create a column named "lag_cat" that contains the value of the "category" column from the previous row, or if there is now row just contains the value of the "category" column of the current row. For example, I want the first few rows of the output to look something like this:

id os event category timestamp lag_cat
1 ios launch start 2022-03-01 13:14:45 start
1 ios landing page - view landing page 2022-03-01 13:14:46 landing page
1 ios message - click messages 2022-03-01 13:14:50 messages

There are multiple IDs in the table and I want this to be done within ID group, so I am using the following code:

SELECT 
  id,
  os,
  event,
  category,
  timestamp,
  LAG(category, 1, 0) OVER (PARTITION BY id ORDER BY timestamp) AS lag_cat
FROM event_stream_time_window

I get the following error: "Numeric value 'Start' is not recognized"

So it seems like LAG() is trying to convert "category" to a numeric type. Is there any way around this?

Thanks!

CodePudding user response:

The issue here is the default value for LAG which is equal 0 (data type numeric). Casting string literals "start"/"messages"/... to nummeric is not possible as they are not valid numbers, thus the error message.

If zero character is required as output then it should be provided as '0':

SELECT 
  id,
  os,
  event,
  category,
  timestamp,
  LAG(category, 1, '0') OVER (PARTITION BY id ORDER BY timestamp) AS lag_cat
FROM event_stream_time_window;

The mechanism that causes it is called "Implicit data type coercion"

CodePudding user response:

If you want the current rows CATEGORY if there is not prior, the defualt value should be CATEGORY not 0

SELECT   
    id,
    os,
    event,
    category,
    timestamp,
    lag(category,1,category) over (PARTITION BY id ORDER BY timestamp) as lag_cat
FROM VALUES
    (1, 'ios',  'launch','start','2022-03-01 13:14:45'),
    (1, 'ios',  'landing page - view','landing page','2022-03-01 13:14:46'),
    (1, 'ios',  'message - click','messages','2022-03-01 13:14:50')
     t(id, os, event, category, timestamp)
ID OS EVENT CATEGORY TIMESTAMP LAG_CAT
1 ios launch start 2022-03-01 13:14:45 start
1 ios landing page - view landing page 2022-03-01 13:14:46 start
1 ios message - click messages 2022-03-01 13:14:50 landing page
  • Related