Timescale interpolated_average with an additional group-CodePudding

Is there a way we add an extra group by to the toolkit_experimental.interpolated_average function? Say my data has power measurements for different sensors; how would I add a group by on the sensor_id?

with s as (
  select sensor_id,
    time_bucket('30 minutes', timestamp) bucket,
    time_weight('LOCF', timestamp, value) agg
  from
    measurements m
    inner join sensor_definition sd on m.sensor_id = sd.id
  where asset_id = '<battery_id>' and sensor_name = 'power' and
    timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
  group by sensor_id, bucket)
select sensor_id,
  bucket,
  toolkit_experimental.interpolated_average(
      agg,
      bucket,
      '30 minutes'::interval,
      lag(agg) over (order by bucket),
      lead(agg) over (order by bucket)
    )
from s
group by sensor_id;

The above query does not work as I'd need to add bucket and agg as a group by column as well.

You can find the relevant schemas below.

create table measurements
(
    sensor_id uuid                     not null,
    timestamp timestamp with time zone not null,
    value     double precision         not null
);

create table sensor_definition
(
    id          uuid default uuid_generate_v4() not null
        primary key,
    asset_id    uuid                            not null,
    sensor_name varchar(256)                    not null,
    sensor_type varchar(256)                    not null,
    unique (asset_id, sensor_name, sensor_type)
);

Any suggestions?

CodePudding user response：

This is a great question and cool use case. There's definitely a way to do this! I like your CTE at the top, though I prefer to name them a little more descriptively. The join looks good for selection and you could even quite easily then sub out the "on-the-fly" aggregation for a continuous aggregate at some point in the future and just do the same join against the continuous aggregate...so that's great!

The only thing you need to do is modify the window clause of the lead and lag functions so that they understand that it's working on not the full ordered data set, and then you don't need a group by clause at all!

WITH weighted_sensor AS (
  SELECT 
    sensor_id,
    time_bucket('30 minutes', timestamp) bucket,
    time_weight('LOCF', timestamp, value) agg 
  FROM
    measurements m
    INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
  WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
    timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
  GROUP BY sensor_id, bucket)
SELECT 
  sensor_id,
  bucket,
  toolkit_experimental.interpolated_average(
      agg,
      bucket,
      '30 minutes'::interval,
      lag(agg) OVER (PARTITION BY sensor_id ORDER BY bucket),
      lead(agg) OVER (PARTITION BY sensor_id ORDER BY bucket)
    )
FROM weighted_sensor;

You can also split out the window clause into a separate clause in the query and name it, this helps especially if you're using it more times, so if you were to use the integral function as well, for instance, to get total energy utilization in a period, you might do something like this:

WITH weighted_sensor AS (
  SELECT 
    sensor_id,
    time_bucket('30 minutes', timestamp) bucket,
    time_weight('LOCF', timestamp, value) agg 
  FROM
    measurements m
    INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
  WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
    timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
  GROUP BY sensor_id, bucket)
SELECT 
  sensor_id,
  bucket,
  toolkit_experimental.interpolated_average(
      agg,
      bucket,
      '30 minutes'::interval,
      lag(agg) OVER sensor_times,
      lead(agg) OVER sensor_times
    ),
toolkit_experimental.interpolated_integral(
      agg,
      bucket,
      '30 minutes'::interval,
      lag(agg) OVER sensor_times,
      lead(agg) OVER sensor_times,
     'hours'
    )
FROM weighted_sensor
WINDOW sensor_times AS (PARTITION BY sensor_id ORDER BY bucket);

I used hours as the unit as I figure energy is often measured in watt-hours or the like...