How to use first_value window function to get last non null value in Postgresql-CodePudding

I want to perform a fill-down activity in PgSQL

DDL:

create table brands 
(
id int,
category varchar(20),
brand_name varchar(20)
);
insert into brands values
(1,'chocolates','5-star')
,(2,null,'dairy milk')
,(3,null,'perk')
,(4,null,'eclair')
,(5,'Biscuits','britannia')
,(6,null,'good day')
,(7,null,'boost')
,(8,'shampoo','h&s')
,(9,null,'dove');

Expected output is:

category	brand_name
chocolates	5-star
chocolates	dairy milk
chocolates	perk
chocolates	eclair
Biscuits	britannia
Biscuits	good day
Biscuits	boost
Shampoo	h&s
Shampoo	dove

I tried using the following script but it doesn't seem to work.

select id,
      first_value(category)
      over(order by case when category is not null then id end desc nulls last) as category,
      brand_name
from brands

Can someone suggest a fix.

In MS SQL the following snippet seems to work fine.

select id,
       first_value (category) IGNORE NULLS
       over(order by id desc
        rows between current row and unbounded following) as category,
       brand_name
FROM brands
ORDER BY id

CodePudding user response：

with cte as (
select id,
       category,
       count(category) over (order by id) as category_id,
       brand_name
  from brands)
select id,
       first_value(category) over (partition by category_id order by id) as category,
       brand_name
  from cte;

UPDATE: added query without CTE per request:

select id,
       (array_agg(category) over (order by id))[max(case when category is null then 0 else id end) over (order by id)] as category,
       brand_name
  from brands;

CodePudding user response：

I think there's nothing wrong with using a CTE (see JHH's answer) and I would prefer that.

Postgres DB's don't provide this IGNORE NULLS concept of SQLServer DB's, so I guess you should stop thinking you will get a nearly identic query for Postgres DB's like in MS SQL.

Anyway, if you don't want to use a CTE or complex subquery, you can define your own function and aggregation and run this.

Function creation:

-- CREATE your function
CREATE FUNCTION yourFunction(STATE anyelement, VALUE anyelement)
    RETURNS anyelement
    IMMUTABLE PARALLEL safe
AS
$$
SELECT COALESCE(VALUE, STATE); -- Replace NULL values here
$$ LANGUAGE SQL;

Aggregate creation using the function:

-- CREATE your aggregate
CREATE AGGREGATE yourAggregate(ANYELEMENT) (
    sfunc = yourFunction, -- Call your function here
    stype = ANYELEMENT
);

Your query using this aggregate:

SELECT id, 
  yourAggregate(category) -- Call your aggregate here
  OVER (ORDER BY id, category), 
  brand_name
FROM brands
ORDER BY id;

Of course, you should rename both function and aggregate and use more meaningful names.

This will produce the same outcome like the CTE version.

Try out: db<>fiddle

If you are keen on defining and using own functions and you will use that often, you could do that.

Otherwise, just use a CTE, that's quite fine. There is no reason to do not use CTE's.

Always be aware that while using own functions, you are on risk of bad performance, so you should check whether this query is too slow.

CodePudding user response：

I am afraid that's not implemented in Postgres (at least up to Postgres 15). The manual about window functions:

The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, first_value, last_value, and nth_value. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS.

So you have to use a workaround with a CTE or subquery like JHH suggested, or roll your own window function (which will be comparatively slow).

See (answer to similar question on dba.SE):

Carry over long sequence of missing values with Postgres