Home > other >  partition over period of time for a given column SQL BigQuery
partition over period of time for a given column SQL BigQuery

Time:06-10

My query outputs data - row by row - for each record I grab a value from last_modified_date column that is the latest date in that column BUT is not later than the date column's value. I save new value in column custom_last_modified_date. Code works fine, but it only checks for the latest date available for one date - instead of every row in a table. Data looks like this:

id           date           last_modified_date
A           02/28/22          2017-02-28 22:44
A           03/05/22          2017-02-28 05:14
A           03/05/22          2017-02-28 07:49
A           03/22/22          2017-02-28 06:09
A           03/22/22          2022-03-01 06:49
B           03/25/22          2022-03-20 07:49
B           03/25/22          2022-04-01 09:24 

Code:

SELECT 
id, 
date,
MAX(
        IF(
            date(
                string(
                    TIMESTAMP(
                        DATETIME(
                            parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
                        )
                    )
                )
            ) <= date,
            date(
                string(
                    TIMESTAMP(
                        DATETIME(
                            parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
                        )
                    )
                )
            ),
            null
        )
    ) OVER (PARTITION BY date, id) as custom_last_modified_date
FROM `my_table`

Output is this:

id           date        custom_last_modified_date
A           02/28/22          02/28/17
A           03/05/22          02/28/17
A           03/05/22          02/28/17
A           03/22/22          03/01/22
A           03/22/22          03/01/22
B           03/25/22          03/20/22
B           03/25/22          03/20/22

Desired output is:

id           date        custom_last_modified_date
A           02/28/22          02/28/22
A           03/05/22          03/01/22
A           03/05/22          03/01/22
A           03/22/22          03/01/22
A           03/22/22          03/01/22
B           03/25/22          03/20/22
B           03/25/22          03/20/22

CodePudding user response:

I tried your query and your just have a mistake on partion by, your query will work if you removed date in your partition by clause. Your query should look like this:

NOTE: I just created a CTE to convert dates to make it readable.

with sample_data as (
select 'A' as id, '03/05/22' as date_field, '2017-02-28 22:44' as last_modified_date,
union all select 'A' as id, '03/05/22' as date_field, '2017-02-28 05:14' as last_modified_date,
union all select 'A' as id, '03/05/22' as date_field, '2017-02-28 07:49' as last_modified_date,
union all select 'A' as id, '03/22/22' as date_field, '2017-02-28 06:09' as last_modified_date,
union all select 'A' as id, '03/22/22' as date_field, '2022-03-01 06:49' as last_modified_date,
union all select 'B' as id, '03/25/22' as date_field, '2022-03-20 07:49' as last_modified_date,
union all select 'B' as id, '03/25/22' as date_field, '2022-04-01 09:24' as last_modified_date,
),

conv_data as (

select 
id,
format_datetime('%m/%d/%y',parse_date('%m/%d/%y',date_field)) as date_field,
format_datetime('%m/%d/%y',parse_datetime('%Y-%m-%d %H:%M', last_modified_date)) as last_modified_date,
from sample_data
)

select 
id,
date_field,
last_modified_date,
max(if(last_modified_date <= date_field, last_modified_date, null)) over (partition by id) as custom_last_modified_date,

from conv_data

Output:

enter image description here

  • Related