I have "customers" table with this data:
cust_id | start_date | end_date | street_id | hood_id | city_id |
---|---|---|---|---|---|
569 | 1/1/2022 | 1/15/2022 | 12 | 17 | 16 |
569 | 1/19/2022 | 1/22/2022 | 9 | 85 | 15 |
569 | 1/25/2022 | 2/25/2022 | 14 | 65 | 19 |
241 | 3/13/2022 | 3/17/2022 | 3 | 6 | 19 |
241 | 3/19/2022 | 3/22/2022 | 5 | 8 | 35 |
241 | 3/25/2022 | 3/29/2022 | 5 | 8 | 488 |
I want to stay just with one raw per cust_id and also to stay with the last changes of street_id, hood_id, and city_id columns. also, I want to create a new sum column that brings me the "total_days" (end_date - start_date). its looks like this:
cust_id | street_id | hood_id | city_id | total_days |
---|---|---|---|---|
569 | 14 | 65 | 19 | 48 |
241 | 5 | 8 | 488 | 11 |
if it will be in presto syntax it will be wonderful.
thanks all.
CodePudding user response:
You can get the latest row per customer with row_number()
and filtering, then do the date arithmetic:
select cust_id, street_id, hood_id, city_id,
date_diff('day', start_date, end_date) as total_days
from (
select c.*, row_number() over(partition by cust_id order by end_date desc) rn
from customers c
) c
where rn = 1
Ah, and you seem to want the overall date difference per customer rather than that of the last row, so we can use more window functions:
select cust_id, street_id, hood_id, city_id, total_days
from (
select c.*,
row_number() over(partition by cust_id order by end_date desc) rn,
sum(date_diff('day', start_date, end_date)) over(partition by cust_id) as total_days
from customers c
) c
where rn = 1
CodePudding user response:
Looks like job for GROUP BY
combined with some aggregation functions (especially very useful in such cases max_by
) and dates handling (parsing dates and counting diff in days):
-- sample data
with dataset(cust_id, start_date, end_date, street_id, hood_id, city_id) as(
values (569, '1/1/2022', '1/15/2022', 12, 17, 16),
(569, '1/19/2022', '1/22/2022', 9, 85, 15),
(569, '1/25/2022', '2/25/2022', 14, 65, 19),
(241, '3/13/2022', '3/17/2022', 3, 6, 19),
(241, '3/19/2022', '3/22/2022', 5, 8, 35),
(241, '3/25/2022', '3/29/2022', 5, 8, 488)
)
-- query
select cust_id,
max_by(street_id, end_date) street_id,
max_by(hood_id, end_date) hood_id,
max_by(city_id, end_date) city_id,
sum(date_diff('day', date_parse(start_date, '%c/%e/%Y'), date_parse(end_date, '%c/%e/%Y'))) total_days
from dataset
group by cust_id;
Output:
cust_id | street_id | hood_id | city_id | total_days |
---|---|---|---|---|
241 | 5 | 8 | 488 | 11 |
569 | 14 | 65 | 19 | 48 |