I have a RoadInsp table in Oracle 18c. I'll put the data in a CTE for purpose of this post:
with roadinsp (objectid, asset_id, date_) as (
select 1, 1, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 2, 1, to_date('2019-03-01','YYYY-MM-DD') from dual union all
select 3, 1, to_date('2022-01-01','YYYY-MM-DD') from dual union all
select 4, 2, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 5, 2, to_date('2021-01-01','YYYY-MM-DD') from dual union all
select 6, 3, to_date('2022-03-01','YYYY-MM-DD') from dual union all
select 7, 3, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 8, 3, to_date('2018-03-01','YYYY-MM-DD') from dual union all
select 9, 3, to_date('2013-03-01','YYYY-MM-DD') from dual union all
select 10, 3, to_date('2010-06-01','YYYY-MM-DD') from dual
)
select * from roadinsp
OBJECTID ASSET_ID DATE_
---------- ---------- ----------
1 1 2016-04-01
2 1 2019-03-01
3 1 2022-01-01 --select this row
4 2 2016-04-01
5 2 2021-01-01 --select this row
6 3 2022-03-01 --select this row
7 3 2016-04-01
8 3 2018-03-01
9 3 2013-03-01
10 3 2010-06-01
I'm using GIS software that only lets me use SQL in a WHERE clause/SQL expression, not a full SELECT query.
I want to select the greatest n per group using the WHERE clause. In other words, for each ASSET_ID, I want to select the row that has the latest date.
As an experiment, I want to make the selection specifically using the EXISTS operator.
The reason being: While this post technically pertains to Oracle (since that's what S.O. community members would have access to), in practice, I want to use the logic in a proprietary database called a file geodatabase. The file geodatabase has very limited SQL support; a small subset of SQL-92 syntax. But it does seem to support EXISTS and subqueries, although not correlated subqueries, group by, joins, or any modern SQL syntax. Very frustrating.
SQL reference for query expressions used in ArcGIS
Subquery support in file geodatabases is limited to the following:
Scalar subqueries with comparison operators. A scalar subquery returns a single value, for example:
GDP2006 > (SELECT MAX(GDP2005) FROM countries)
For file geodatabases, the set functions AVG, COUNT, MIN, MAX, and SUM can only be used in scalar subqueries.
EXISTS predicate, for example:
EXISTS (SELECT * FROM indep_countries WHERE COUNTRY_NAME = 'Mexico')
Question:
Using the EXISTS operator, is there a way to select the greatest n per group? (keeping in mind the limitations mentioned above)
CodePudding user response:
rank
analytic function does the job, if it is available to you (Oracle 18c certainly does support it).
Sample data:
SQL> with roadinsp (objectid, asset_id, date_) as (
2 select 1, 1, to_date('2016-04-01','YYYY-MM-DD') from dual union all
3 select 2, 1, to_date('2019-03-01','YYYY-MM-DD') from dual union all
4 select 3, 1, to_date('2022-01-01','YYYY-MM-DD') from dual union all
5 select 4, 2, to_date('2016-04-01','YYYY-MM-DD') from dual union all
6 select 5, 2, to_date('2021-01-01','YYYY-MM-DD') from dual union all
7 select 6, 3, to_date('2022-03-01','YYYY-MM-DD') from dual union all
8 select 7, 3, to_date('2016-04-01','YYYY-MM-DD') from dual union all
9 select 8, 3, to_date('2018-03-01','YYYY-MM-DD') from dual union all
10 select 9, 3, to_date('2013-03-01','YYYY-MM-DD') from dual union all
11 select 10, 3, to_date('2010-06-01','YYYY-MM-DD') from dual
12 ),
Query begins here: first rank rows per asset_id
by date in descending order:
13 temp as
14 (select r.*,
15 rank() over (partition by asset_id order by date_ desc) rnk
16 from roadinsp r
17 )
Finally, fetch rows that rank as the highest:
18 select *
19 from temp
20 where rnk = 1;
OBJECTID ASSET_ID DATE_ RNK
---------- ---------- ---------- ----------
3 1 2022-01-01 1
5 2 2021-01-01 1
6 3 2022-03-01 1
SQL>
If you can't use that, how about a subquery?
<snip>
13 select r.objectid, r.asset_id, r.date_
14 from roadinsp r
15 where (r.asset_id, r.date_) in (select t.asset_id, t.max_date
16 from (select a.asset_id, max(a.date_) max_date
17 from roadinsp a
18 group by a.asset_id
19 ) t
20 );
OBJECTID ASSET_ID DATE_
---------- ---------- ----------
6 3 2022-03-01
5 2 2021-01-01
3 1 2022-01-01
SQL>
CodePudding user response:
This should work:
select *
from roadinsp r
where not exists (select 1
from roadinsp pr
where r.asset_id=pr.asset_id
and r.date_< pr.date_)
We get only the entry for each asset_id if there is no (NOT EXISTS) a later (by date) row; only the last rows of each asset satisfy this.