I'm trying to return a set of rows where if a car_id
in the table below contains a duplicate, just use the most recent entry by created_on
and apply the WHERE
clause to the most recent entry. In which case if the most recent entry doesn't satisfy, do not return anything.
SQLFiddle: For the question - http://sqlfiddle.com/#!17/cc0a5/2
Note: Please see edit - statement at the end of question for further clarification
Data
Table: car_wash
Here is a sample table with all the data.
id | car_id | next_clean | end_at | created_on |
---|---|---|---|---|
79d | d48 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235425 00:00 |
1f3 | 645 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235434 00:00 |
397 | d41 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235435 00:00 |
435 | 372 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235436 00:00 |
3d5 | d41 | 2022-06-25 18:32:52.068891 00:00 | 2022-06-25 18:34:32.269289 00:00 | 2022-06-25 18:34:32.269252 00:00 |
2df | 372 | 2022-07-25 18:35:32.068891 00:00 | 2022-06-25 18:35:32.269293 00:00 | 2022-06-25 18:35:32.269291 00:00 |
Based on the data, I want to retrieve rows of cars that need to be cleaned i.e their next_clean
date has elapsed, ~and the log of their previous recorded clean (end_at
) is NULL, i.e the clean never took place or finished.~
In the event a car appears multiple times e.g. car_id: 372
for id: 435
and id: 2df
, then only take into consideration the most recent (created_on
) record i.e id: 2df
and if id: 2df
meets the criteria mentioned above return it, if not do not return it or even id: 372
. I only care about the most recent entry.
For the attempted postgres query below, assume that now()
takes place between the next_clean
dates for id: 435
and id: 2df
(a month in the future), meaning 2df
is in the future and will fail the WHERE
check. For those wondering why id: 2df
has a populated end_at
field even though the next_clean
hasn't taken place. I'm using the end_at
to signify a clean happened, and the next_clean
stores when a new clean should take place)
What I Have Tried
Based on numerous solutions, the use of window functions was suggested, but I am not sure I'm applying it correctly
SELECT *
FROM (SELECT
id,
car_id,
next_clean,
end_at,
created_on,
row_number() over (partition by car_id order by created_on desc) as rn
FROM car_wash
WHERE car_wash.user_id = 'some-id'
AND (car_wash.next_clean <= now())
AND car_wash.end_at IS NULL
LIMIT 100 OFFSET 0) t
WHERE rn = 1
Result
id | car_id | next_clean | end_at | created_on | rn |
---|---|---|---|---|---|
435 | 372 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235436 00:00 | 1 |
1f3 | 645 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235434 00:00 | 1 |
397 | d41 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235435 00:00 | 1 |
79d | d48 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235425 00:00 | 1 |
The problem with this result is that since id: 2df
fails the check as it's next_clean
is in the future, id: 435
for car_id: 372
is returned which is not what I want (as it unfortunately does pass the check, but I only want to focus on the most recently created car_id
).
Intended/Expected
id | car_id | next_clean | end_at | created_on | rn |
---|---|---|---|---|---|
1f3 | 645 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235434 00:00 | 1 |
397 | d41 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235435 00:00 | 1 |
79d | d48 | 2022-06-25 18:29:32.210955 00:00 | NULL | 2022-06-25 18:34:32.235425 00:00 | 1 |
Edit: @jholkin raised a good point regarding why we return id: 3d5
in the first place since end_at
is not null. It introduces conflicting priorities. So ideally the WHERE
clause should only care that the created_on
is most recent if there are duplicates, and that the next_clean
is <= now()
. The expected result is still the same as now end_at
isn't a factor.
CodePudding user response:
You can use DISTINCT ON
to produce only the first row per group (per car_id
) according to an ordering (created_on DESC
). Then, filtering is trivial.
For example:
select *
from (
select distinct on (car_id) *
from car_wash
order by car_id, created_on desc
) x
where next_clean <= now() and end_at is null
CodePudding user response:
if the condition says, that when there are more than two records of the same car_id, it should take the most recent one based on the create_on field and this record must have null value in its end_at field, why should it return the record with id 397? If the most current create_on value is id 3d5. this query should help you:
select t.*
from ( select distinct on (cw.car_id) cw.id, cw.car_id ,
last_value (cw.next_clean) over (partition by cw.next_clean order by cw.next_clean desc) as next_clean ,
cw.end_at ,
(cw.created_on) as created_on from car_wash cw ) t
where t.next_clean <= now() and t.end_at is null;
CodePudding user response:
you can do it with FIRST_VALUE() window functions:
https://www.postgresqltutorial.com/postgresql-window-function/postgresql-first_value-function/
https://www.postgresql.org/docs/current/tutorial-window.html#:~:text=A window function performs a,done with an aggregate function.
https://www.postgresql.org/docs/current/functions-window.html