Home > Mobile >  one row for rec_id with the last change and total days | sql
one row for rec_id with the last change and total days | sql

Time:01-02

I have "customers" table with this data:

cust_id start_date end_date street_id hood_id city_id
569 1/1/2022 1/15/2022 12 17 16
569 1/19/2022 1/22/2022 9 85 15
569 1/25/2022 2/25/2022 14 65 19
241 3/13/2022 3/17/2022 3 6 19
241 3/19/2022 3/22/2022 5 8 35
241 3/25/2022 3/29/2022 5 8 488

I want to stay just with one raw per cust_id and also to stay with the last changes of street_id, hood_id, and city_id columns. also, I want to create a new sum column that brings me the "total_days" (end_date - start_date). its looks like this:

cust_id street_id hood_id city_id total_days
569 14 65 19 48
241 5 8 488 11

if it will be in presto syntax it will be wonderful.

thanks all.

CodePudding user response:

You can get the latest row per customer with row_number() and filtering, then do the date arithmetic:

select cust_id, street_id, hood_id, city_id,
    date_diff('day', start_date, end_date) as total_days
from (
    select c.*, row_number() over(partition by cust_id order by end_date desc) rn
    from customers c
) c
where rn = 1

Ah, and you seem to want the overall date difference per customer rather than that of the last row, so we can use more window functions:

select cust_id, street_id, hood_id, city_id, total_days
from (
    select c.*, 
        row_number() over(partition by cust_id order by end_date desc) rn,
        sum(date_diff('day', start_date, end_date)) over(partition by cust_id) as total_days
    from customers c
) c
where rn = 1

CodePudding user response:

Looks like job for GROUP BY combined with some aggregation functions (especially very useful in such cases max_by) and dates handling (parsing dates and counting diff in days):

-- sample data
with dataset(cust_id, start_date, end_date, street_id, hood_id, city_id) as(
    values (569, '1/1/2022', '1/15/2022', 12,   17, 16),
    (569, '1/19/2022', '1/22/2022', 9,  85, 15),
    (569, '1/25/2022', '2/25/2022', 14, 65, 19),
    (241, '3/13/2022', '3/17/2022', 3,  6, 19),
    (241, '3/19/2022', '3/22/2022', 5,  8, 35),
    (241, '3/25/2022', '3/29/2022', 5,  8, 488)
)

-- query
select cust_id,
    max_by(street_id, end_date) street_id,
    max_by(hood_id, end_date) hood_id,
    max_by(city_id, end_date) city_id,
    sum(date_diff('day', date_parse(start_date, '%c/%e/%Y'), date_parse(end_date, '%c/%e/%Y'))) total_days
from dataset
group by cust_id;

Output:

cust_id street_id hood_id city_id total_days
241 5 8 488 11
569 14 65 19 48
  • Related