Home > Software engineering >  Presto lag dates, group/partitioned by id
Presto lag dates, group/partitioned by id

Time:02-12

Say that I want to find every time that a client updated their budget.

Here's what my data looks like

datetime, client_id, new_budget
__________,__________,___________
2022-01-01,1,        ,100
2022-01-01,2,        ,300
2022-01-02,1,        ,80
2022-01-02,2,        ,80

And the code I've run.

SELECT datetime AS dt_1,
       LAG(datetime) OVER (ORDER BY client_id, datetime) AS dt_2,
       client_id,
       new_budget
FROM budget_table

What I'm expecting returned will be

dt_1,      dt_2,      client_id, new_budget
__________,__________,__________,___________
2022-01-01,NULL,      1         , 100
2022-01-02,2022-01-01,1         , 80
2022-01-01,NULL,      2         , 300
2022-01-02,2022-01-01,2         , 80 

Hence there are NULL values for dt_2 in the first entry of each client_id. I'm not sure what code will accomplish this effect; is a GROUP BY clause will be necessary (or a partition over clause.)

But here's the output to the SQL that I ran

dt_1,      dt_2,      client_id, new_budget
__________,__________,__________,___________
2022-01-01,NULL,      1         , 100
2022-01-02,2022-01-01,1         , 80
2022-01-01,2022-01-02,2         , 300
2022-01-02,2022-01-01,2         , 80 

So the huge issue here is that it's not recognizing that the dt_2 should be NULL if the previous row is from a different client_id.

Which syntax is recommended to accomplish this effect?

CodePudding user response:

You need to partition by client_id:

The PARTITION BY clause separates the input rows into different partitions. This is analogous to how the GROUP BY clause separates rows into different groups for aggregate functions. If PARTITION BY is not specified, the entire input is treated as a single partition.

SELECT datetime AS dt_1,
       LAG(datetime) OVER (PARTITION BY client_id ORDER BY datetime) AS dt_2,
       client_id,
       new_budget
FROM budget_table
  • Related