I have a PostgreSQL database with time series data. To check if the data quality especially whether the sensors are writing to the database I implemented a SQL view which gives me the average of the last hour and I can see the data with this code:
SELECT avg(value) AS avg,
location
name
FROM measurement
WHERE datetime >= (now() - '01:00:00'::interval)
GROUP BY location, name
The time serie data table looks like this:
datetime value location name
0 2018-01-01 13:35:00 01:00 0.22 HLN NO2
1 2018-01-01 13:35:00 01:00 0.31 HLN CO
2 2018-01-01 13:35:00 01:00 1.15 HLN NO
3 2018-01-01 13:40:00 01:00 1.80 AIS NO2
4 2018-01-01 13:40:00 01:00 2.60 AIS CO
5 2018-01-01 13:40:00 01:00 2.30 AIS NO
6 2018-01-01 13:45:00 01:00 2.25 HLN NO2
7 2018-01-02 13:45:00 01:00 2.14 HLN CO
8 2018-01-02 13:45:00 01:00 2.96 HLN NO
9 2018-01-02 14:35:00 01:00 0.76 HLN NO2
10 2018-01-02 14:35:00 01:00 0.80 HLN CO
11 2018-01-02 14:35:00 01:00 1.19 HLN NO
12 2018-01-02 14:40:00 01:00 1.10 AIS NO2
13 2018-01-02 14:40:00 01:00 2.87 AIS CO
14 2018-01-02 14:40:00 01:00 2.80 AIS NO
15 2018-01-02 14:45:00 01:00 3.06 HLN NO2
16 2018-01-02 13:45:00 01:00 2.86 HLN CO
17 2018-01-02 13:45:00 01:00 2.22 HLN NO
Since I have a huge database and some sensors stopped writing data 2 weeks ago, from time to time I check the MAX(datetime)
for each location to see which one sent the last time the data.
Now my question is, can I expand my sql query to also include the max time for each location?
I know for example from the initial view that 2 sensors don't send data to the database, since they are not visible in the last hour of the data and I know from the MAX(datetime)
that they stopped 2 weeks ago to send data.
CodePudding user response:
Solution with LATERAL JOIN
SELECT avg(m.value) AS avg
, m.location
, m.name
, s.max_datetime
FROM measurement AS m
CROSS JOIN LATERAL
(
SELECT max(datetime) AS max_datetime
FROM measurement AS p
WHERE p.location = m.location
AND p.name = m.name
) AS s
WHERE m.datetime >= ('2018-01-02 14:40:00 01:00' :: timestamp - interval '1 hour')
GROUP BY m.location, m.name, s.max_datetime
Solution with JOIN
SELECT avg.*, max.max_datetime
FROM
( SELECT avg(value) AS avg
, location
, name
FROM measurement
WHERE datetime >= '2018-01-02 14:40:00 01:00' :: timestamp - interval '1 hour'
GROUP BY location, name
) AS avg
INNER JOIN
( SELECT location
, name
, max(datetime) AS max_datetime
FROM measurement
GROUP BY location, name
) AS max
ON avg.location = max.location
AND avg.name = max.name
The EXPLAIN ANALYSE shows that solution with JOIN is faster.
See the results in dbfiddle