I have a function which returns a table. One of the columns happens to be a text array. Currently the values in this array column will only ever have at most 2 elements, however there is the instance when the same row will be returned twice with the duplicate row elements in the opposite order. I'm hoping to find a way to only return 1 of these rows and discard the other. To give an example, I run a function as
SELECT * FROM schema.function($1,$2,$3)
WHERE conditions...;
which returns me something like this
ID | ARRAY_COL | ...
1 | {'Good','Day'} | ...
2 | {'Day','Good'} | ...
3 | {'Stuck'} | ...
4 | {'with'} | ...
5 | {'array'} | ...
6 | {'filtering'} | ...
So in this example, I want to return the whole result set with the exception that I only want either row 1 or 2 as they have the same elements in the array (albeit inverted with respect to each other). I'm aware this is probably a bit of a messy problem, but it's something I need to get to the bottom of. Ideally I would like to stick a WHERE clause at the end of my function call which forced the result set to ignore any array value that had the same elements as a previous row. Pseudo code might be something like
SELECT * FROM schema.function($1,$2,$3)
WHERE NOT array_col @> (any previous array_col value);
Any pointers in the right direction would be much appreciated, thanks.
CodePudding user response:
Not sure is the best solution, but could work especially in cases where you might have partially overlapping arrays.
The solution is in 3 steps:
unnest
thearray_col
column and order byid
and the item value
select
id,
unnest(array_col) array_col_val
from dataset
order by
id,
array_col_val
- regroup by
id
, now row withid
1 and 2 have the samearray_col
value
select id,
array_agg(array_col_val) array_col
from ordering
group by id
- Select the min
id
grouping byarray_col
select array_col, min(id) from regrouping group by array_col
Full statement
with dataset as (
select 1 id, ARRAY['Good','Day'] array_col UNION ALL
select 2 id, ARRAY['Day','Good'] array_col UNION ALL
select 3 id, ARRAY['Stuck'] array_col UNION ALL
select 4 id, ARRAY['with'] array_col UNION ALL
select 5 id, ARRAY['array_colay'] array_col UNION ALL
select 6 id, ARRAY['filtering'] array_col
)
, ordering as
(select id,
unnest(array_col) array_col_val
from dataset
order by id,
array_col_val)
, regrouping as
(
select id,
array_agg(array_col_val) array_col
from ordering
group by id
)
select array_col, min(id) from regrouping group by array_col;
Result
array_col | min
--------------- -----
{Stuck} | 3
{array_colay} | 5
{filtering} | 6
{with} | 4
{Day,Good} | 1
(5 rows)