I have searched for days for my unique problem but seem to not find the answer so i decided to post my case here. I have a server that records IOT data for different transformer equipment on an MQQT server. Each equipment has a station, line_name, mw, amp, time(in epoch), hour, minute, seconds, and date. A station has several line_name with different readings for each row, there are several stations with several line_names. The task I have been trying to carry out is to get the average mw on each line_name and return rows with 15 minutes average of each line_name of a station, the server stores a row for each line every 4 or 5 seconds depending on the time the IOT server receives the input. I am not strong at the database end, but I understand some basic queries and this search has made me learn more. Below is one of the query I tried which did not work
SELECT
avg(mw),
station,
generate_series(1646676821000, 1646676841000, 5000) as interval_alias
FROM lines_table
where station='station_name' and line_name='tr1'
GROUP BY interval_alias, station
this didnt work. How can i query a 15 minutes interval average of each line_name mw parameter. Thanks in advance
CodePudding user response:
For this kind of grouping, the simplest is to extract a group key from your timestamp variable.
Let's make some test data:
CREATE UNLOGGED TABLE foo (t TIMESTAMP NOT NULL, d FLOAT);
INSERT INTO foo SELECT '2020-01-01'::TIMESTAMP '5 SECOND'::INTERVAL*n, random()
from generate_series(1,100000) n;
Now to make a group slice key every 15 minutes, split the timestamp with date_trunc, extract the minutes, round down to 15 minute boundary, and rebuild a timestamp out of that.
SELECT date_trunc('HOUR',t) floor(date_part('MIN',t)/15)*'1 min'::INTERVAL AS slice, count(*), avg(d) FROM foo GROUP BY slice ORDER BY 1 LIMIT 10;
slice | count | avg
--------------------- ------- ---------------------
2020-01-01 00:00:00 | 179 | 0.4869196677564143
2020-01-01 00:01:00 | 180 | 0.493655494562419
2020-01-01 00:02:00 | 180 | 0.47648568624182625
2020-01-01 00:03:00 | 180 | 0.5192120204018761
2020-01-01 01:00:00 | 180 | 0.5132500835069955
2020-01-01 01:01:00 | 180 | 0.4752729458694133
2020-01-01 01:02:00 | 180 | 0.4959755133838442
2020-01-01 01:03:00 | 180 | 0.48427990508883934
2020-01-01 02:00:00 | 180 | 0.5272323624260469
2020-01-01 02:01:00 | 180 | 0.5286562389017389
The advantage of this method is that it can use a simple seq scan on the table. It doesn't need to use an index on date to fetch each slice.
CodePudding user response:
That should be simple using the date_bin
function:
SELECT
avg(mw),
station,
date_bin('15 minutes', timestamp, '2000-01-01 00:00:00') AS 15_min_bin
FROM lines_table
where station='station_name' and line_name='tr1'
GROUP BY 15_min_bin, station;
That requires that you
use PostgreSQL v14 or later
have
timestamp
stored as data typetimestamp
ortimestamp with time zone
To drive home the lesson that it is not a good idea to store timestamps as numbers, I leave the conversion of the numbers to proper timestamps as an exercise to the reader. It is pretty straightforward, but of course makes the query harder to read and somewhat slower.