Home > Software design >  How can I calculate an average of rows at a 15 minutes interval on POSTGRESQL
How can I calculate an average of rows at a 15 minutes interval on POSTGRESQL

Time:06-14

I have searched for days for my unique problem but seem to not find the answer so i decided to post my case here. I have a server that records IOT data for different transformer equipment on an MQQT server. Each equipment has a station, line_name, mw, amp, time(in epoch), hour, minute, seconds, and date. A station has several line_name with different readings for each row, there are several stations with several line_names. The task I have been trying to carry out is to get the average mw on each line_name and return rows with 15 minutes average of each line_name of a station, the server stores a row for each line every 4 or 5 seconds depending on the time the IOT server receives the input. I am not strong at the database end, but I understand some basic queries and this search has made me learn more. Below is one of the query I tried which did not work

SELECT 
   avg(mw), 
   station,
   generate_series(1646676821000, 1646676841000, 5000)  as interval_alias
FROM lines_table 
where station='station_name' and line_name='tr1'
GROUP BY interval_alias, station

this didnt work. How can i query a 15 minutes interval average of each line_name mw parameter. Thanks in advance

CodePudding user response:

For this kind of grouping, the simplest is to extract a group key from your timestamp variable.

Let's make some test data:

CREATE UNLOGGED TABLE foo (t TIMESTAMP NOT NULL, d FLOAT); 
INSERT INTO foo SELECT '2020-01-01'::TIMESTAMP   '5 SECOND'::INTERVAL*n, random() 
    from generate_series(1,100000) n;

Now to make a group slice key every 15 minutes, split the timestamp with date_trunc, extract the minutes, round down to 15 minute boundary, and rebuild a timestamp out of that.

 SELECT date_trunc('HOUR',t)   floor(date_part('MIN',t)/15)*'1 min'::INTERVAL AS slice, count(*), avg(d) FROM foo GROUP BY slice ORDER BY 1 LIMIT 10;
        slice        | count |         avg
--------------------- ------- ---------------------
 2020-01-01 00:00:00 |   179 |  0.4869196677564143
 2020-01-01 00:01:00 |   180 |   0.493655494562419
 2020-01-01 00:02:00 |   180 | 0.47648568624182625
 2020-01-01 00:03:00 |   180 |  0.5192120204018761
 2020-01-01 01:00:00 |   180 |  0.5132500835069955
 2020-01-01 01:01:00 |   180 |  0.4752729458694133
 2020-01-01 01:02:00 |   180 |  0.4959755133838442
 2020-01-01 01:03:00 |   180 | 0.48427990508883934
 2020-01-01 02:00:00 |   180 |  0.5272323624260469
 2020-01-01 02:01:00 |   180 |  0.5286562389017389

The advantage of this method is that it can use a simple seq scan on the table. It doesn't need to use an index on date to fetch each slice.

CodePudding user response:

That should be simple using the date_bin function:

SELECT 
   avg(mw), 
   station,
   date_bin('15 minutes', timestamp, '2000-01-01 00:00:00') AS 15_min_bin
FROM lines_table
where station='station_name' and line_name='tr1'
GROUP BY 15_min_bin, station;

That requires that you

  • use PostgreSQL v14 or later

  • have timestamp stored as data type timestamp or timestamp with time zone

To drive home the lesson that it is not a good idea to store timestamps as numbers, I leave the conversion of the numbers to proper timestamps as an exercise to the reader. It is pretty straightforward, but of course makes the query harder to read and somewhat slower.

  • Related