Home > Software design >  How do I get a subset of a subset of records elegantly with SQL?
How do I get a subset of a subset of records elegantly with SQL?

Time:04-04

Let's say I have a table MyTable with the following 5 columns

Key     OID        Date1       Date2   Val
  0       A   2020-01-01  2020-02-10     3  
  1       A   2020-01-01  2020-02-09     6  
  2       A   2020-01-02  2020-02-09     4  
  3       A   2020-01-03  2020-02-04     3  
  3       A   2020-01-03  2020-02-02     1  
  4       B   2020-01-02  2020-02-01     3  
  5       B   2020-01-03  2020-02-01     3  
  6       B   2020-01-03  2020-02-01     3  
  7       B   2020-01-04  2020-02-05     9  
  8       B   2020-01-04  2020-02-08     6  
  9       B   2020-01-04  2020-02-07     2

For each unique value in the OID column I want to

  1. extract the subset of records corresponding to this OID value
  2. then extract the subset of those records having the maximum date in Date1
  3. then extract the unique record having the (guaranteed) maximum date in Date2

For the example above this would result in

So the result would be

Key     OID        Date1       Date2    Val
  3       A   2020-01-03  2020-02-04      3  
  8       B   2020-01-04  2020-02-08      6

I am new to SQL and struggling to think of an elegant solution. In the spirit of having a go before posting on SO here is my attempt. It is ugly. I am certain it could be more succinct and idiomatic. I just don't have the skill at this time.

WITH max_date1 (OID, Date1) AS
(
    SELECT OID, MAX(Date1)
    FROM MyTable
    GROUP BY OID
),
max_date2 (OID, Date2) AS
(
    SELECT max_date1.OID, MAX(MyTable.Date2)
    FROM max_date1 
    LEFT JOIN MyTable
    ON t2.OID = max_date1.OID
    WHERE max_date1.Date1 = MyTable.Date1
    GROUP BY max_date1.OID
).
max_dates (OID, Date1, Date2) AS
(
    SELECT *
    FROM max_date1
    LEFT JOIN max_date2
    USING (OID)
)
SELECT *
FROM MyTable
LEFT JOIN max_dates
ON (
    MyTable.OID = max_dates.OID
    AND MyTable.Date1 = max_dates.Date1
    AND MyTable.Date2 = max_dates.Date2
)

Any improvements would be much appreciated. Thanks!

CodePudding user response:

I would use window functions as follows:

SELECT KEY
    , OID
    , Date1
    , Date2
    , Val
FROM (
    SELECT KEY
        , OID
        , Date1
        , Date2
        , Val
        , max(Date1) OVER (PARTITION BY oid) AS max_date1
        , max(Date2) OVER (PARTITION BY oid, date1) AS max_date2
        , row_number() OVER (PARTITION BY oid, date1 order by date2 DESC) AS rn
    FROM mytable
    ) AS d
WHERE date1 = max_date1
    AND date2 = max_date2
    AND rn = 1

The date comparisons should locate the subset within subset, and row_number() is used to ensure just one row from each is returned.

If you prefer to use CTE's then just re-arrange the derived table into a CTE:

WITH CTE AS (
    SELECT KEY
        , OID
        , Date1
        , Date2
        , Val
        , max(Date1) OVER (PARTITION BY oid) AS max_date1
        , max(Date2) OVER (PARTITION BY oid, date1) AS max_date2
        , row_number() OVER (PARTITION BY oid, date1 order by date2 DESC) AS rn
    FROM mytable
    ) 
SELECT KEY
    , OID
    , Date1
    , Date2
    , Val
FROM CTE
WHERE date1 = max_date1
    AND date2 = max_date2
    AND rn = 1

CodePudding user response:

    select a.* from MyTable a
    inner join
    (
    select oid, max(Date1),max(Date2) from MyTable
group by oid
    ) as b
    on (a.oid=b.oid and a.date1=b.date1 and a.date2=b.date2)

CodePudding user response:

An approach with nested groups:

select 
    t.*
from
    MyTable t
    inner join (
        select
            g.OID,
            g.max_Date1,
            max(t.Date2) max_Date2
        from
            MyTable t
            inner join (
                select OID, max(Date1) max_Date1
                from MyTable
                group by OID
            ) g on g.OID = t.OID and g.max_Date1 = t.Date1
        group by
            t.OID
    ) g on g.OID = t.OID and g.max_Date1 = t.Date1 and g.max_Date2 = t.Date2;

result with your sample data:

Id OID Date1 Date2 Val
3 A 2020-01-03 2020-02-04 3
8 B 2020-01-04 2020-02-08 6

An index (or, if you can, a clustered index) over (OID, Date1, Date2) helps here.

  •  Tags:  
  • sql
  • Related