Home > Back-end >  How to combine query from last 1 hour with max time for each id?
How to combine query from last 1 hour with max time for each id?

Time:12-08

I have a PostgreSQL database with time series data. To check if the data quality especially whether the sensors are writing to the database I implemented a SQL view which gives me the average of the last hour and I can see the data with this code:

SELECT avg(value) AS avg,
       location
       name
FROM measurement
WHERE datetime >= (now() - '01:00:00'::interval)
GROUP BY location, name

The time serie data table looks like this:

    datetime                    value   location   name
0    2018-01-01 13:35:00 01:00   0.22    HLN    NO2
1    2018-01-01 13:35:00 01:00   0.31    HLN    CO
2    2018-01-01 13:35:00 01:00   1.15    HLN    NO
3    2018-01-01 13:40:00 01:00   1.80    AIS    NO2
4    2018-01-01 13:40:00 01:00   2.60    AIS    CO
5    2018-01-01 13:40:00 01:00   2.30    AIS    NO
6    2018-01-01 13:45:00 01:00   2.25    HLN    NO2
7    2018-01-02 13:45:00 01:00   2.14    HLN    CO
8    2018-01-02 13:45:00 01:00   2.96    HLN    NO
9    2018-01-02 14:35:00 01:00   0.76    HLN    NO2
10   2018-01-02 14:35:00 01:00   0.80    HLN    CO
11   2018-01-02 14:35:00 01:00   1.19    HLN    NO
12   2018-01-02 14:40:00 01:00   1.10    AIS    NO2
13   2018-01-02 14:40:00 01:00   2.87    AIS    CO
14   2018-01-02 14:40:00 01:00   2.80    AIS    NO
15   2018-01-02 14:45:00 01:00   3.06    HLN    NO2
16   2018-01-02 13:45:00 01:00   2.86    HLN    CO
17   2018-01-02 13:45:00 01:00   2.22    HLN    NO

Since I have a huge database and some sensors stopped writing data 2 weeks ago, from time to time I check the MAX(datetime) for each location to see which one sent the last time the data.

Now my question is, can I expand my sql query to also include the max time for each location?

I know for example from the initial view that 2 sensors don't send data to the database, since they are not visible in the last hour of the data and I know from the MAX(datetime) that they stopped 2 weeks ago to send data.

CodePudding user response:

Solution with LATERAL JOIN

SELECT avg(m.value) AS avg
     , m.location
     , m.name
     , s.max_datetime
  FROM measurement AS m
CROSS JOIN LATERAL
(
SELECT max(datetime) AS max_datetime
  FROM measurement AS p
 WHERE p.location = m.location
   AND p.name = m.name
) AS s
 WHERE m.datetime >= ('2018-01-02 14:40:00 01:00' :: timestamp - interval '1 hour')
 GROUP BY m.location, m.name, s.max_datetime

Solution with JOIN

SELECT avg.*, max.max_datetime
  FROM
     ( SELECT avg(value) AS avg
            , location
            , name
         FROM measurement
        WHERE datetime >= '2018-01-02 14:40:00 01:00' :: timestamp - interval '1 hour'
        GROUP BY location, name
     ) AS avg
 INNER JOIN
     ( SELECT location
            , name
            , max(datetime) AS max_datetime
         FROM measurement
        GROUP BY location, name
     ) AS max
   ON avg.location = max.location
  AND avg.name = max.name

The EXPLAIN ANALYSE shows that solution with JOIN is faster.

See the results in dbfiddle

  • Related