I have a RoadInsp table in a SQLite 3.39 database. I've put the data in a CTE for purpose of this question:
with roadinsp (objectid, asset_id, date_, condition) as (
values
(1, 1, '2016-04-01', 20),
(2, 1, '2019-03-01', 19),
(3, 1, '2022-01-01', 18),
(4, 2, '2016-04-01', 17),
(5, 2, '2022-01-01', 16),
(6, 3, '2022-03-01', 15), --duplicate date
(7, 3, '2022-03-01', 14), --duplicate date
(8, 3, '2019-01-01', 13)
)
select * from roadinsp
objectid asset_id date_ condition
1 1 2016-04-01 20
2 1 2019-03-01 19
3 1 2022-01-01 18
4 2 2016-04-01 17
5 2 2022-01-01 16
6 3 2022-03-01 15
7 3 2022-03-01 14
8 3 2019-01-01 13
I'm using GIS software that only lets me use SQL in a WHERE clause/SQL expression, not a full SELECT query.
I want to select the greatest n per group using a WHERE clause. In other words, for each ASSET_ID, I want to select the row that has the latest date.
I can achieve that using a WHERE clause expression like this:
date_ =
(select max(subq.date_) from roadinsp subq where roadinsp.asset_id = subq.asset_id)
objectid asset_id date_ condition
3 1 2022-01-01 18
5 2 2022-01-01 16
6 3 2022-03-01 15
7 3 2022-03-01 14
That works, but it selects two rows for asset #3, since there are two rows with the same date for that asset.
So I want to break the tie by selecting the row that has the highest condition value. It would look like this:
objectid asset_id date_ condition
3 1 2022-01-01 18
5 2 2022-01-01 16
6 3 2022-03-01 15 --this row has a higher condition value than the other duplicate row.
--so the other duplicate row was omitted.
I only ever want to select one row per asset. So if there are duplicate condition values too, then it doesn't matter which one is selected, as long as only one is selected.
Using a WHERE clause subquery, how can I select the greatest n per group, and break ties with the row that has the largest condition?
CodePudding user response:
Perhaps you could add another similar condition to the where clause (as I understand your question, that's all you can modify?):
select *
from roadinsp r
where
date_ = (
select max(subq.date_)
from roadinsp subq
where r.asset_id = subq.asset_id
)
and condition = (
select Max(condition)
from roadinsp c
where c.date_ = r.date_
and c.asset_id = r.asset_id
);
Updated Fiddle
CodePudding user response:
You can avoid the MAX
aggregation, and making it more efficient, by using the ROW_NUMBER
window function. It will generate an increasing ranking value, given a partitioning and an order inside the partition you identify. In your case you need to:
- partition on the "asset_id" field
- order on the "date_" and "condition" fields, descendently for both
Once you have that, it is enough to filter out all rows that have ranking higher than 1 (the rows you want to exclude, that have lower "date_" and "condition" values inside each "asset_id" partition).
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY asset_id ORDER BY date_ DESC, condition DESC) AS rn
from roadinsp
)
SELECT *
FROM cte
WHERE rn = 1
Check the demo here.