Home > Net >  PostgreSQL - Filtering result set by array column
PostgreSQL - Filtering result set by array column

Time:10-07

I have a function which returns a table. One of the columns happens to be a text array. Currently the values in this array column will only ever have at most 2 elements, however there is the instance when the same row will be returned twice with the duplicate row elements in the opposite order. I'm hoping to find a way to only return 1 of these rows and discard the other. To give an example, I run a function as

SELECT * FROM schema.function($1,$2,$3)
WHERE conditions...;

which returns me something like this

ID | ARRAY_COL         | ...
1  | {'Good','Day'}    | ...
2  | {'Day','Good'}    | ...
3  | {'Stuck'}         | ...
4  | {'with'}          | ...
5  | {'array'}         | ...
6  | {'filtering'}     | ...

So in this example, I want to return the whole result set with the exception that I only want either row 1 or 2 as they have the same elements in the array (albeit inverted with respect to each other). I'm aware this is probably a bit of a messy problem, but it's something I need to get to the bottom of. Ideally I would like to stick a WHERE clause at the end of my function call which forced the result set to ignore any array value that had the same elements as a previous row. Pseudo code might be something like

 SELECT * FROM schema.function($1,$2,$3)
 WHERE NOT array_col @> (any previous array_col value);

Any pointers in the right direction would be much appreciated, thanks.

CodePudding user response:

Not sure is the best solution, but could work especially in cases where you might have partially overlapping arrays.

The solution is in 3 steps:

  1. unnest the array_col column and order by id and the item value
select 
    id, 
    unnest(array_col) array_col_val 
from dataset 
order by 
    id, 
    array_col_val
  1. regroup by id, now row with id 1 and 2 have the same array_col value
select id, 
        array_agg(array_col_val) array_col 
    from ordering 
    group by id
  1. Select the min id grouping by array_col
select array_col, min(id) from regrouping group by array_col

Full statement

with dataset as (
    select 1 id, ARRAY['Good','Day'] array_col UNION ALL
    select 2 id, ARRAY['Day','Good'] array_col UNION ALL
    select 3 id, ARRAY['Stuck'] array_col UNION ALL
    select 4 id, ARRAY['with'] array_col UNION ALL
    select 5 id, ARRAY['array_colay'] array_col UNION ALL
    select 6 id, ARRAY['filtering'] array_col
)
, ordering as 
 (select id, 
    unnest(array_col) array_col_val 
 from dataset 
 order by id, 
 array_col_val)
, regrouping as 
 (
    select id, 
        array_agg(array_col_val) array_col 
    from ordering 
    group by id
)
select array_col, min(id) from regrouping group by array_col;

Result

   array_col   | min
--------------- -----
 {Stuck}       |   3
 {array_colay} |   5
 {filtering}   |   6
 {with}        |   4
 {Day,Good}    |   1
(5 rows)
  • Related