Home > Enterprise >  grab the latest date record that is not later than date in another column SQL BigQuery
grab the latest date record that is not later than date in another column SQL BigQuery

Time:06-02

I have two date columns in BQ table. pageview_date and edited_date, as well as id column. I need to output the data - row by row - and for each record I want to grab a value from edited_date column that is the latest date in that column BUT is not later than the pageview_date value itself. And if both dates are equal then leave it as is. It also has to correspond with the ids. Data looks like this:

id         pageview_date         edited_date
A            03/01/22               02/28/22
A            03/01/22               02/02/22
A            03/01/22               02/02/22
B            03/01/22               01/01/22
B            03/01/22               01/01/22
B            03/01/22               01/31/22
C            03/01/22               04/01/22
C            03/01/22               03/25/22
C            03/01/22               03/01/22

Desired output is:

id         pageview_date         edited_date
A            03/01/22               02/28/22
A            03/01/22               02/28/22
A            03/01/22               02/28/22
B            03/01/22               01/31/22
B            03/01/22               01/31/22
B            03/01/22               01/31/22
C            03/01/22               03/01/22
C            03/01/22               03/01/22
C            03/01/22               03/01/22

CodePudding user response:

An approach is use the MAX window function in the edited_date column partitioned by the id:

with sample as (
  select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-02-28') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-03-28') as edited_date
  UNION ALL
  select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-01-28') as edited_date
)
SELECT
  id,
  pageview_date,
  MAX(IF(edited_date <= pageview_date, edited_date, null)) OVER (PARTITION BY id) as new_edited_date
FROM sample

Note if there's no edited_date before an pageview_date, the new_edited_date will be null.

  • Related