Home > Enterprise >  SQL selecting first record per group
SQL selecting first record per group

Time:02-22

I have a table that looks like this:

CREATE TABLE UTable (
    m_id        TEXT PRIMARY KEY,
    u1          TEXT,
    u2          TEXT,
    u3          TEXT,
    -- other stuff, as well as
    gid         INTEGER,
    gt          TEXT,
    d           TEXT,
    timestamp   TIMESTAMP
);

CREATE TABLE OTable (
    gid         INTEGER,
    gt          TEXT,
    d           TEXT,
    -- other stuff, such as
    n           INTEGER
);
CREATE UNIQUE INDEX OTable_idx ON OTable (gid, gt, d);

For each record in OTable that matches a condition (fixed values of gid, gt), I want to join the corresponding record in UTable with the minimum timestamp.

What's catching me is that in my final result I don't care about the timestamp, I clearly need to group on d (since gid and gt are fixed), and yet I do need to extract u1, u2, u3 from the selected record.

SELECT o.d, u.u1, u.u2, u.u3, o.n
    FROM UTable u
INNER JOIN OTable o
  ON u.gid = o.gid AND u.gt = o.gt AND u.d = o.d
WHERE u.gid = 3 AND u.gt = 'dog night'
GROUP BY u.d
-- and u.timestamp is the minimum for each group
;

I think my first step should be just to do the select on UTable and then I can join against that. But even there I'm a bit confused.

SELECT u.d, u.u1, u.u2, u.u3
    FROM UTable u
WHERE u.gid = 3 AND u.gt = 'dog night';

I want to add HAVING MIN(u.timestamp), but that's not valid.

Any pointers as to what I need to do?

I did see this question, but it isn't quite what I need, since I can't group on all the UTable values lest I select too many things.

CodePudding user response:

GROUP BY u.d (without also listing u1, u2, u3) would only work if u.d was the PRIMARY KEY (which it is not, and also wouldn't make sense in your scenario). See:

I suggest DISTINCT ON in a subquery on UTable instead:

SELECT o.d, u.u1, u.u2, u.u3, o.n
FROM  (
   SELECT DISTINCT ON (u.d)
          u.d, u.u1, u.u2, u.u3
   FROM   UTable u
   WHERE  u.gid = 3
   AND    u.gt = 'dog night'
   ORDER  BY u.d, u.timestamp
   ) u
JOIN   OTable o USING (gid, gt, d);

See:

If UTable is big, at least a multicolumn index on (gid, gt) is advisable. Same for OTable.
Maybe even on (gid, gt, d). Depends on data types, cardinalities, ...

  • Related