I have a table with line numbers and either a "define" or an "undefine" event of an identifier. Example:
line_no | def | undef
--------------------
1 | 'a' | NULL
2 | 'b' | NULL
3 | NULL| 'a'
...
42 | NULL| 'b'
I want to compute the "live variables" information on every line. Iteratively, I'd just write code like the following pseudocode:
live = []
for each r in table:
if r.def:
live.append(r.def)
else:
live.remove(r.undef)
store(r.line_no, live)
The expected result is a table like :
line_no | live
1. | ['a']
2. | ['a', 'b']
3. | ['b']
...
42. | []
I managed to write the equivalent sequential loop as a plpgsql function. However, I was wondering if there is a (maybe more elegant) way as a SQL query? I tried various things using lag() but somehow that never resulted in this "reduction" operation I am looking for?
(bonus points if the query can also partition over a field, such as 'filename')
CodePudding user response:
If you want a pure SQL solution, use a recursive query.
with recursive cte as (
select line_no, array[def] as list
from my_table
where line_no = 1
union all
select
t.line_no,
case when def is null
then array_remove(list, undef)
else array_append(list, def)
end
from my_table t
join cte c on c.line_no = t.line_no- 1
)
select *
from cte;
However, a more efficient and flexible solution in this case may be creating a function.
create or replace function list_of_vars()
returns table (line_no int, list text[])
language plpgsql as $$
declare
arr text[];
rec record;
begin
for rec in
select *
from my_table
order by line_no
loop
if rec.def is null then
arr := array_remove(arr, rec.undef);
else
arr := array_append(arr, rec.def);
end if;
line_no := rec.line_no;
list := arr;
return next;
end loop;
end
$$;
Use:
select *
from list_of_vars()
order by line_no
Test it in db<>fiddle.
CodePudding user response:
One way this can be achieved is to use a recursive query which is in the end pretty similar to the iteration you are thinking of.
with recursive lines as (
select line_no,
(case when def is not null then jsonb_build_array(def) else '[]'::jsonb end) - coalesce(undef, '') live
from the_table
where line_no = 1
union all
select c.line_no,
(p.live || case when def is not null then jsonb_build_array(c.def) else '[]'::jsonb end) - coalesce(c.undef, '')
from the_table c
join lines p on c.line_no - 1 = p.line_no
)
select *
from lines;
jsonb_build_array
will happily include a NULL value, that's why we need the somewhat convoluted CASE expression to turn a single value into an array (with one or zero elements).
The -
operator for jsonb removes the element on the right hand side from the array. However an null
value on the right hand side, will remove all elements, hence the coalesce()
This requires the values for line_no
to not have any gaps (and start with 1
). If that is not a case, another step would be required to generate a gap-less number for each row (which would make this even slower)
I doubt if that is actually faster then a "proper" procedural solution.