I have two date columns in BQ table. pageview_date
and edited_date
, as well as id
column. I need to output the data - row by row - and for each record I want to grab a value from edited_date
column that is the latest date in that column BUT is not later than the pageview_date
value itself. And if both dates are equal then leave it as is. It also has to correspond with the ids. Data looks like this:
id pageview_date edited_date
A 03/01/22 02/28/22
A 03/01/22 02/02/22
A 03/01/22 02/02/22
B 03/01/22 01/01/22
B 03/01/22 01/01/22
B 03/01/22 01/31/22
C 03/01/22 04/01/22
C 03/01/22 03/25/22
C 03/01/22 03/01/22
Desired output is:
id pageview_date edited_date
A 03/01/22 02/28/22
A 03/01/22 02/28/22
A 03/01/22 02/28/22
B 03/01/22 01/31/22
B 03/01/22 01/31/22
B 03/01/22 01/31/22
C 03/01/22 03/01/22
C 03/01/22 03/01/22
C 03/01/22 03/01/22
CodePudding user response:
An approach is use the MAX
window function in the edited_date
column partitioned by the id
:
with sample as (
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-02-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-03-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-01-28') as edited_date
)
SELECT
id,
pageview_date,
MAX(IF(edited_date <= pageview_date, edited_date, null)) OVER (PARTITION BY id) as new_edited_date
FROM sample
Note if there's no edited_date
before an pageview_date, the new_edited_date
will be null
.