Is there a way we add an extra group by to the toolkit_experimental.interpolated_average
function? Say my data has power measurements for different sensors; how would I add a group by on the sensor_id
?
with s as (
select sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
from
measurements m
inner join sensor_definition sd on m.sensor_id = sd.id
where asset_id = '<battery_id>' and sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
group by sensor_id, bucket)
select sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) over (order by bucket),
lead(agg) over (order by bucket)
)
from s
group by sensor_id;
The above query does not work as I'd need to add bucket
and agg
as a group by column as well.
You can find the relevant schemas below.
create table measurements
(
sensor_id uuid not null,
timestamp timestamp with time zone not null,
value double precision not null
);
create table sensor_definition
(
id uuid default uuid_generate_v4() not null
primary key,
asset_id uuid not null,
sensor_name varchar(256) not null,
sensor_type varchar(256) not null,
unique (asset_id, sensor_name, sensor_type)
);
Any suggestions?
CodePudding user response:
This is a great question and cool use case. There's definitely a way to do this! I like your CTE at the top, though I prefer to name them a little more descriptively. The join looks good for selection and you could even quite easily then sub out the "on-the-fly" aggregation for a continuous aggregate at some point in the future and just do the same join against the continuous aggregate...so that's great!
The only thing you need to do is modify the window clause of the lead
and lag
functions so that they understand that it's working on not the full ordered data set, and then you don't need a group by clause at all!
WITH weighted_sensor AS (
SELECT
sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
FROM
measurements m
INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
GROUP BY sensor_id, bucket)
SELECT
sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER (PARTITION BY sensor_id ORDER BY bucket),
lead(agg) OVER (PARTITION BY sensor_id ORDER BY bucket)
)
FROM weighted_sensor;
You can also split out the window clause into a separate clause in the query and name it, this helps especially if you're using it more times, so if you were to use the integral function as well, for instance, to get total energy utilization in a period, you might do something like this:
WITH weighted_sensor AS (
SELECT
sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
FROM
measurements m
INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
GROUP BY sensor_id, bucket)
SELECT
sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER sensor_times,
lead(agg) OVER sensor_times
),
toolkit_experimental.interpolated_integral(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER sensor_times,
lead(agg) OVER sensor_times,
'hours'
)
FROM weighted_sensor
WINDOW sensor_times AS (PARTITION BY sensor_id ORDER BY bucket);
I used hours as the unit as I figure energy is often measured in watt-hours or the like...