Postgresql sequential add/remove reduction operation-CodePudding

I have a table with line numbers and either a "define" or an "undefine" event of an identifier. Example:

line_no | def | undef
--------------------
1       | 'a' | NULL
2       | 'b' | NULL
3       | NULL| 'a'
... 
42      | NULL| 'b'

I want to compute the "live variables" information on every line. Iteratively, I'd just write code like the following pseudocode:

live = [] 
for each r in table:
    if r.def:
        live.append(r.def)
    else:
        live.remove(r.undef)
    store(r.line_no, live)

The expected result is a table like :

line_no | live
1.      | ['a']
2.      | ['a', 'b'] 
3.      | ['b'] 
... 
42.     | []

I managed to write the equivalent sequential loop as a plpgsql function. However, I was wondering if there is a (maybe more elegant) way as a SQL query? I tried various things using lag() but somehow that never resulted in this "reduction" operation I am looking for?

(bonus points if the query can also partition over a field, such as 'filename')

CodePudding user response：

If you want a pure SQL solution, use a recursive query.

with recursive cte as (
    select line_no, array[def] as list
    from my_table
    where line_no = 1
union all
    select 
        t.line_no, 
        case when def is null 
            then array_remove(list, undef)
            else array_append(list, def)
        end
    from my_table t
    join cte c on c.line_no = t.line_no- 1
)
select *
from cte;

However, a more efficient and flexible solution in this case may be creating a function.

create or replace function list_of_vars()
returns table (line_no int, list text[])
language plpgsql as $$
declare
    arr text[];
    rec record;
begin
    for rec in
        select *
        from my_table
        order by line_no
    loop
        if rec.def is null then
            arr := array_remove(arr, rec.undef);
        else
            arr := array_append(arr, rec.def);
        end if;
        line_no := rec.line_no;
        list := arr;
        return next;
    end loop;
end
$$;

Use:

select *
from list_of_vars()
order by line_no

Test it in db<>fiddle.

CodePudding user response：

One way this can be achieved is to use a recursive query which is in the end pretty similar to the iteration you are thinking of.

with recursive lines as (
  select line_no, 
         (case when def is not null then jsonb_build_array(def) else '[]'::jsonb end) - coalesce(undef, '') live
  from the_table
  where line_no = 1
  union all
  select c.line_no, 
         (p.live || case when def is not null then jsonb_build_array(c.def) else '[]'::jsonb end) - coalesce(c.undef, '')
  from the_table c
    join lines p on c.line_no - 1 = p.line_no
)
select *
from lines;

jsonb_build_array will happily include a NULL value, that's why we need the somewhat convoluted CASE expression to turn a single value into an array (with one or zero elements).

The - operator for jsonb removes the element on the right hand side from the array. However an null value on the right hand side, will remove all elements, hence the coalesce()

This requires the values for line_no to not have any gaps (and start with 1). If that is not a case, another step would be required to generate a gap-less number for each row (which would make this even slower)

Online example

I doubt if that is actually faster then a "proper" procedural solution.