I'm using Postgresql I want to select 1 row for every unique id, I have this query:
SELECT DISTINCT ON (station_id) *
FROM readings
ORDER BY station_id,reading_id DESC;
I have 2 tables:
Readings(content isn't important, i hope):
station_id reading_id temp air_pressure
--------------------------------------------
147 100 10 800
148 101 20 850
149 102 30 900
148 103 40 950
148 104 50 1000
147 105 60 1050
Stations(content not important i hope):
id station_name lat lng
-----------------------------
147
148
149
so station_id in readings and station.id in stations should be the same.
So, it works but now I want to select not the newest one (highest id), but the second newest one. Like ignoring first row (newest reading) and just select the second row.
I want to get something like this:
station_id reading_id temp air_pressure
----------------------------------------------
148 103 40 950
147 100 10 800
I want the second highest reading_id for every station_id (second newest reading)
would be nice if it won't select the ones where there is only one row, but not necessary
I'm not sure how can i achieve this. I've tried so much with offset
, limit
etc. also tried some nested select queries ...
I want to use distinct on because I need to select second row from every unique id. But if it's not possible with distinct on, it's alright.
I'm not sure if I have to group by
it somehow or order by
...
CodePudding user response:
The best query heavily depends on undisclosed information.
Assuming:
- current Postgres 14
- large table
- many rows per
station_id
- no duplicates on
(station_id, reading_id)
- a
station
table with 1 row per station reading_id
is definedNOT NULL
.
SELECT r.*
FROM station s
CROSS JOIN LATERAL (
SELECT *
FROM readings r
WHERE r.station_id = s.station_id
ORDER BY reading_id DESC
OFFSET 1
LIMIT 1
) r
ORDER BY r.station_id; -- optional
You get no row for stations with less than two rows.
This is very efficient for more than a few rows per station and an index on readings(station_id, reading_id DESC)
.
If there can be duplicates, and/or there are only few rows per group, this might be better:
SELECT (r).*
FROM (
SELECT r, dense_rank() OVER (PARTITION BY station_id ORDER BY reading_id DESC) AS rnk
FROM readings r
) r
WHERE rnk = 2;
You get no row for stations with less than two distinct reading_id
.
Related:
If you don't have a separate station table (like you probably should), see: