Home > front end >  `ffill` equivalent in SQL
`ffill` equivalent in SQL

Time:10-29

Assume I have the following table:

day0, measurement0
day1, null
day2, measurement1
day3, null
day4, null

I want to get something like this:

day0, measurement0
day1, measurement0
day2, measurement1
day3, measurement1
day4, measurement1

I've seen other solution to this problem, e.g. How can I fill down values in postgres but I don't get it. Even if I understood it, those solutions are too complex for everyday use, and after few months nobody will understand such code.

Is there anything simple like Pandas df.ffill() for SQL (especially for PostgreSQL)?

CodePudding user response:

Solution 1 : using a window function

You can use a window function while ordering the days by descendent order, selecting the range from the current row and all the next rows (in the descendent order), filtering on the non null values only, and selecting the first non null value that is found :

SELECT day, (array_agg(measurement) FILTER (WHERE measurement IS NOT NULL) OVER (ORDER BY day DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING))[1]
  FROM mytable
 ORDER BY day

The window function first_value(measurement) would be perfect but it can't be used with a FILTER clause, so it is replaced by the aggregate function array_agg(measurement) whose first element is the only one selected.

Solution 2 : using a specific aggregate function

This solution comes from https://patternmatchers.wordpress.com/2021/06/11/ignore-nulls-in-postgres/

CREATE OR REPLACE FUNCTION coalesce_sfunc(x anyelement, y anyelement)
    returns anyelement immutable parallel safe language sql AS
$$
select coalesce(x, y);
$$ ;

CREATE OR REPLACE AGGREGATE find_last_ignore_nulls(anyelement) (
    sfunc = coalesce_r_sfunc,
    stype = anyelement
);

SELECT day, find_last_ignore_nulls(measurement)
  FROM mytable
 ORDER BY day

Both solutions have the same query plan, so no big difference in terms of performance.

see dbfiddle

  • Related