Home > Back-end >  DISTINCT ON, but get the second row, not the first
DISTINCT ON, but get the second row, not the first

Time:07-01

I'm using Postgresql I want to select 1 row for every unique id, I have this query:

SELECT DISTINCT ON (station_id) *
FROM readings
ORDER BY station_id,reading_id DESC;

I have 2 tables:

Readings(content isn't important, i hope):

station_id  reading_id    temp   air_pressure
--------------------------------------------
  147          100         10       800
  148          101         20       850
  149          102         30       900
  148          103         40       950
  148          104         50       1000
  147          105         60       1050

Stations(content not important i hope):

id   station_name   lat   lng
-----------------------------
147      
148     
149    

so station_id in readings and station.id in stations should be the same.

So, it works but now I want to select not the newest one (highest id), but the second newest one. Like ignoring first row (newest reading) and just select the second row.

I want to get something like this:

station_id   reading_id   temp   air_pressure
----------------------------------------------
   148          103        40        950        
   147          100        10        800

I want the second highest reading_id for every station_id (second newest reading)

would be nice if it won't select the ones where there is only one row, but not necessary

I'm not sure how can i achieve this. I've tried so much with offset, limit etc. also tried some nested select queries ...

I want to use distinct on because I need to select second row from every unique id. But if it's not possible with distinct on, it's alright.

I'm not sure if I have to group by it somehow or order by ...

CodePudding user response:

The best query heavily depends on undisclosed information.

Assuming:

  • current Postgres 14
  • large table
  • many rows per station_id
  • no duplicates on (station_id, reading_id)
  • a station table with 1 row per station
  • reading_idis defined NOT NULL.
SELECT r.*
FROM   station s
CROSS  JOIN LATERAL (
   SELECT *
   FROM   readings r
   WHERE  r.station_id = s.station_id
   ORDER  BY reading_id DESC
   OFFSET 1
   LIMIT  1
   ) r
ORDER  BY r.station_id;  -- optional

You get no row for stations with less than two rows.

This is very efficient for more than a few rows per station and an index on readings(station_id, reading_id DESC).

If there can be duplicates, and/or there are only few rows per group, this might be better:

SELECT (r).*
FROM  (
   SELECT r, dense_rank() OVER (PARTITION BY station_id ORDER BY reading_id DESC) AS rnk
   FROM   readings r
   ) r
WHERE  rnk = 2;

You get no row for stations with less than two distinct reading_id.

Related:

If you don't have a separate station table (like you probably should), see:

  • Related