Home > Enterprise >  How to remove values from an array that exist in another array?
How to remove values from an array that exist in another array?

Time:02-22

In PostgreSQL I have a table (Table) that contains an id column (ID) and another column (Values) that contains an array of strings. I have a select query (SelectQuery) that gets me an ID that matches Table.ID, as well as an array of values (RemoveValues). I would like to now remove from the Values array, any strings that are contained in the RemoveValues array that match on ID, updating Table.

--Table--

ID            Values
--  ----------------------------
1   {string1, string2, string3}
2   {string1, string2, string3}
...

--SelectQuery--

ID          RemoveValues
--  ----------------------------
1   {string2}
2   {string1, string2}
...

Execute query that I can't figure out

--Table--

ID            Values
--  ----------------------------
1   {string1, string3}
2   {string3}
...

I can't seem to figure out a good way to do this, I've played with a number of joins and the sql array methods and haven't found anything that works. Is there's a way to loop the remove() method for each element in RemoveValues? I've also tried something like this with no luck:

select array_agg(elem)
from Table, unnest(Table.Values) elem
where elem <> all(
    SelectQuery )
);

I'm feeling like my remaining option is a bash script, unless anyone can suggest a path forward using SQL?

CodePudding user response:

There are many ways.

First, you have to define whether there can be duplicates or NULL values in either of the arrays - and how to deal with those if possible. And whether to preserve original order of elements.

Assuming either is possible, and these are the rules:

  • Count duplicates separately. So if Values has five elements 'foo' and RemoveValues has three elements 'foo', thee will be two element 'foo' in the result.

  • Treat NULL values equal. So NULL can be removed with NULL (though in other contexts NULL = NULL yields NULL).

  • Order of array elements does not have to be preserved.

  • <SelectQuery> produces unique ID values.

That matches the behavior of standard SQL EXCEPT ALL. See:

So:

SELECT t.*, sub.result_array
FROM   tbl t
JOIN   (<SelectQuery>) s USING ("ID")
CROSS  JOIN LATERAL (
   SELECT ARRAY (
      SELECT unnest(t."Values")
      EXCEPT ALL
      SELECT unnest(s."RemoveValues")
      )
   ) sub(result_array);

About the LATERAL subquery:

Can be plain CROSS JOIN because an ARRAY constructor always produces a row.

You could wrap the functionality in a custom function:

CREATE FUNCTION f_arr_except_arr(a1 text[], a2 text[])
  RETURNS text[]
  LANGUAGE SQL IMMUTABLE PARALLEL SAFE
BEGIN ATOMIC
SELECT ARRAY (SELECT unnest(a1) EXCEPT ALL SELECT unnest(a2));
END;

Requires Postgres 14. See:

For older (or any) Postgres versions, and for any array type:

CREATE OR REPLACE FUNCTION f_arr_except_arr(a1 anyarray, a2 anyarray)
  RETURNS anyarray
  LANGUAGE SQL IMMUTABLE PARALLEL SAFE AS
$func$
SELECT ARRAY (SELECT unnest(a1) EXCEPT ALL SELECT unnest(a2));
$func$;

db<>fiddle here

(PARALLEL SAFE only for Postgres 9.6 or later, though.)

Then your query can be something like:

SELECT *, f_arr_except_arr(t."Values", s."RemoveValues") AS result_values
FROM   tbl t
JOIN  (<SelectQuery>) s USING ("ID");

It's unclear whether you actually locked in CaMeL-case spelling with double-quotes. I assumed as much, but better you don't. See:

And your UPDATE can be:

UPDATE tbl t
SET    "Values" = f_arr_except_arr(t."Values", s."RemoveValues")
FROM   <SelectQuery>) s
WHERE  s."ID" = t."ID"
AND    "Values" IS DISTINCT FROM f_arr_except_arr(t."Values", s."RemoveValues");

The additional WHERE clause is optional to avoid empty updates. See:

  • Related