Home > Mobile >  Creating an aggregation function for variability between min and max
Creating an aggregation function for variability between min and max

Time:08-06

I have a table giving weather by city, for example:

CREATE TABLE weather (
    city VARCHAR,
    temp INTEGER
);

I would like to create a custom aggregate function that gives me the difference between the lowest and highest temperature, for example:

SELECT city, variability(temp)   -- same as MAX(weather) - MIN(weather)
FROM weather
GROUP BY city

I think I can do this by maintaining an two-element integer array of min and max temperatures, but am having a tough time jamming this into the Aggregate syntax for Postgres:

CREATE OR REPLACE FUNCTION variability_final(minMax INTEGER[2]) RETURNS numeric AS $$
    SELECT $1[1] - $1[0]
$$ LANGUAGE sql;


CREATE OR REPLACE FUNCTION variability_accum(minMax INTEGER[2], val INTEGER)
RETURNS INTEGER[2] AS $$
    SELECT MIN(val, $1[0]), MAX(val, $1[1])
$$ LANGUAGE sql;

CREATE AGGREGATE variability(numeric, numeric)
(
    INITCOND = [null, null],
    STYPE = numeric[2],
    SFUNC = variability_accum,
    FINALFUNC = variability_final
);

The types are throwing me off a bit here. What would be the correct way to write the above two functions?

CodePudding user response:

You are on the correct path, but not quite.

  1. You need to keep data types consistent. You switch numeric and integer. While these are the same class they are not exactly the same.
  2. Arrays in Postgres are 1-based not 0-based. So the references to $1[0] and $1[1] should be $1[1] and $1[2] respectively.
  3. The functions MIN() and MAX() themselves are aggregate functions (looks like Postgres does not like nested aggregates), but these are the wrong functions anyway as the work on columns. You are looking for the least and greatest functions.
  4. Postgres accepts specifying array dimension, but does not enforce them. So basically they are extra baggage (good perhaps for documentation but not much else).

Taking the above into account we arrive at: (see demo)

create or replace function variability_final(minmax integer[])
  returns integer
 language sql
as $$
    select minmax[2] - minmax[1]
$$ ;


create or replace function variability_accum(minmax integer[], val integer)
 returns integer[] 
 language sql
as $$
    select array [ least(val,minmax[1]), greatest(val, minmax[2]) ]; 
$$;

create aggregate variability( integer )
(   initcond = '{null, null}',
    stype = integer[],
    sfunc = variability_accum,
    finalfunc = variability_final
);

NOTE: I prefer using parameter names to positional numbering ($n). I change to that in the above and in demo.

  • Related