Let's say I have a table MyTable
with the following 5 columns
Key OID Date1 Date2 Val
0 A 2020-01-01 2020-02-10 3
1 A 2020-01-01 2020-02-09 6
2 A 2020-01-02 2020-02-09 4
3 A 2020-01-03 2020-02-04 3
3 A 2020-01-03 2020-02-02 1
4 B 2020-01-02 2020-02-01 3
5 B 2020-01-03 2020-02-01 3
6 B 2020-01-03 2020-02-01 3
7 B 2020-01-04 2020-02-05 9
8 B 2020-01-04 2020-02-08 6
9 B 2020-01-04 2020-02-07 2
For each unique value in the OID
column I want to
- extract the subset of records corresponding to this OID value
- then extract the subset of those records having the maximum date in Date1
- then extract the unique record having the (guaranteed) maximum date in Date2
For the example above this would result in
So the result would be
Key OID Date1 Date2 Val
3 A 2020-01-03 2020-02-04 3
8 B 2020-01-04 2020-02-08 6
I am new to SQL and struggling to think of an elegant solution. In the spirit of having a go before posting on SO here is my attempt. It is ugly. I am certain it could be more succinct and idiomatic. I just don't have the skill at this time.
WITH max_date1 (OID, Date1) AS
(
SELECT OID, MAX(Date1)
FROM MyTable
GROUP BY OID
),
max_date2 (OID, Date2) AS
(
SELECT max_date1.OID, MAX(MyTable.Date2)
FROM max_date1
LEFT JOIN MyTable
ON t2.OID = max_date1.OID
WHERE max_date1.Date1 = MyTable.Date1
GROUP BY max_date1.OID
).
max_dates (OID, Date1, Date2) AS
(
SELECT *
FROM max_date1
LEFT JOIN max_date2
USING (OID)
)
SELECT *
FROM MyTable
LEFT JOIN max_dates
ON (
MyTable.OID = max_dates.OID
AND MyTable.Date1 = max_dates.Date1
AND MyTable.Date2 = max_dates.Date2
)
Any improvements would be much appreciated. Thanks!
CodePudding user response:
I would use window functions as follows:
SELECT KEY
, OID
, Date1
, Date2
, Val
FROM (
SELECT KEY
, OID
, Date1
, Date2
, Val
, max(Date1) OVER (PARTITION BY oid) AS max_date1
, max(Date2) OVER (PARTITION BY oid, date1) AS max_date2
, row_number() OVER (PARTITION BY oid, date1 order by date2 DESC) AS rn
FROM mytable
) AS d
WHERE date1 = max_date1
AND date2 = max_date2
AND rn = 1
The date comparisons should locate the subset within subset, and row_number() is used to ensure just one row from each is returned.
If you prefer to use CTE's then just re-arrange the derived table into a CTE:
WITH CTE AS (
SELECT KEY
, OID
, Date1
, Date2
, Val
, max(Date1) OVER (PARTITION BY oid) AS max_date1
, max(Date2) OVER (PARTITION BY oid, date1) AS max_date2
, row_number() OVER (PARTITION BY oid, date1 order by date2 DESC) AS rn
FROM mytable
)
SELECT KEY
, OID
, Date1
, Date2
, Val
FROM CTE
WHERE date1 = max_date1
AND date2 = max_date2
AND rn = 1
CodePudding user response:
select a.* from MyTable a
inner join
(
select oid, max(Date1),max(Date2) from MyTable
group by oid
) as b
on (a.oid=b.oid and a.date1=b.date1 and a.date2=b.date2)
CodePudding user response:
An approach with nested groups:
select
t.*
from
MyTable t
inner join (
select
g.OID,
g.max_Date1,
max(t.Date2) max_Date2
from
MyTable t
inner join (
select OID, max(Date1) max_Date1
from MyTable
group by OID
) g on g.OID = t.OID and g.max_Date1 = t.Date1
group by
t.OID
) g on g.OID = t.OID and g.max_Date1 = t.Date1 and g.max_Date2 = t.Date2;
result with your sample data:
Id | OID | Date1 | Date2 | Val |
---|---|---|---|---|
3 | A | 2020-01-03 | 2020-02-04 | 3 |
8 | B | 2020-01-04 | 2020-02-08 | 6 |
An index (or, if you can, a clustered index) over (OID, Date1, Date2)
helps here.