I have two columns ind
and tar
that both contain arrays.
ind tar
{10} {10}
{6} {5,6}
{4,5,6} {5,6}
{5,6} {5,6}
{7,8} {11}
{11} {5,6,7}
{11} {8}
{9,10} {6}
I want to find if one value exists in both arrays, and if that's true, I want to keep it only at column ind
. For example, at the first row I have the value 10 in both columns. I want to end up with this value only in column ind
and leave column tar
empty. This is the expected result:
ind tar
{10}
{6} {5}
{4,5,6}
{5,6}
{7,8} {11}
{11} {5,6,7}
{11} {8}
{9,10} {6}
How can I do that in PostgreSQL?
So far I only managed to find the common elements, but I don't know how to continue with keeping them only at ind
column and remove them from tar
column.
with t1 as (
select distinct ind, tar
from table_1
join table_2 using (id)
limit 50
),
t2 as (
select ind & tar as common_el, ind , tar
from t1
)
select *
from t2
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
which results into this:
common_el ind tar
{10} {10} {10}
{6} {6} {5,6}
{5,6} {4,5,6} {5,6}
{5,6} {5,6} {5,6}
CodePudding user response:
The &
operator you are using is from the intarray
module which also allows you to use -
to remove elements in one array from another.
For eg.
select
ind,
tar,
ind & tar as common_el,
tar - (ind & tar) as new_tar
from
table_1
ind | tar | common_el | new_tar |
---|---|---|---|
{10} | {10} | {10} | {} |
{6} | {5,6} | {6} | {5} |
{4,5,6} | {5,6} | {5,6} | {} |
{5,6} | {5,6} | {5,6} | {} |
{7,8} | {11} | {} | {11} |
{11} | {5,6,7} | {} | {5,6,7} |
{11} | {8} | {} | {8} |
{9,10} | {6} | {} | {6} |
View working demo db fiddle here
CodePudding user response:
Introduction:
I decided to tackle this using only PostgreSQL's standard functionality - i.e. without using the INTARRAY
extension - nothing wrong with using it, just fancied a challenge this afternoon! There's also a simplified method using INTARRAY functionality - simpler than the accepted answer!
Method 1 (standard array functionality with CASE - fiddle):
SELECT
(ind && tar) AS overlaps,
ind, tar,
ind - tar AS ind_m_tar,
tar - ind AS tar_m_ind,
CASE
WHEN ind = tar THEN ind -- 1
WHEN NOT (ind && tar) THEN ind -- 2
WHEN ind <@ tar
AND (CARDINALITY (tar) > CARDINALITY (ind))
THEN ind -- 3
WHEN tar <@ ind
AND (CARDINALITY (ind) > CARDINALITY (tar))
THEN ind -- 4
END AS new_ind,
CASE
WHEN ind = tar THEN NULL -- 1
WHEN NOT (ind && tar) THEN tar -- 2
WHEN ind <@ tar
AND (CARDINALITY (tar) > CARDINALITY (ind))
THEN tar - ind -- 3
WHEN tar <@ ind
AND (CARDINALITY (ind) > CARDINALITY (tar))
THEN NULL -- 4
END AS new_tar
FROM test;
Result:
overlaps ind tar ind_m_tar tar_m_ind new_ind new_tar
t {10} {10} {} {} {10} NULL
t {6} {5,6} {} {5} {6} {5}
t {4,5,6} {5,6} {4} {} {4,5,6} NULL
t {5,6} {5,6} {} {} {5,6} NULL
f {7,8} {11} {7,8} {11} {7,8} {11}
f {11} {5,6,7} {11} {5,6,7} {11} {5,6,7}
f {11} {8} {11} {8} {11} {8}
f {9,10} {6} {9,10} {6} {9,10} {6}
So, we can see that by combining standard ARRAY functions
, we can SELECT
the necessary fields using a CASE expression
- obviously, you can only SELECT
the fields that are relevant to you - I left the others in so that the logic could be followed.
Second method (using UNNEST
and INTERSECT
- fiddle):
First thing is to obtain the values to be eliminated from the tar
arrays - we use the often neglected INTERSECT operator to do this.
--
-- Get the values to be deleted from the tar arrays, by ROW_NUMBER()
--
WITH cte1 AS
(
SELECT
ROW_NUMBER() OVER () AS rn,
UNNEST(ind) AS un_ind,
UNNEST(tar) AS un_tar
FROM test
)
SELECT rn, un_ind AS x FROM cte1
INTERSECT
SELECT rn, un_tar FROM cte1
ORDER BY rn, x;
Result:
rn x
1 10
2 6
3 5
3 6
4 5
4 6
We can see that this is correct - for the 1st ind, tar
array, we want to remove the 2nd (i.e. tar
) value of 10
- for the second array, we wish to remove the duplicated 6
value, and so on.
So, our next query is:
WITH cte1 AS
(
SELECT
ROW_NUMBER() OVER () AS rn,
UNNEST(ind) AS un_ind,
UNNEST(tar) AS un_tar
FROM test
),
cte2 AS
(
SELECT rn, un_ind AS x FROM cte1
INTERSECT
SELECT rn, un_tar FROM cte1
ORDER BY rn, x
),
cte3 AS
(
SELECT
rn,
un_ind,
CASE
WHEN (cte1.rn, cte1.un_tar) =
(
SELECT rn, x
FROM cte2
WHERE rn = cte1.rn
AND x = cte1.un_tar
) THEN NULL
ELSE un_tar
END
FROM cte1
)
SELECT * FROM cte3;
Result:
rn un_ind un_tar
1 10 NULL
2 6 5
2 NULL NULL
3 4 NULL
3 5 NULL
3 6 NULL
4 5 NULL
4 6 NULL
5 7 11
5 8 NULL
6 11 5
6 NULL 6
6 NULL 7
7 11 8
8 9 6
8 10 NULL
The final step:
WITH cte1 AS
(
SELECT
ROW_NUMBER() OVER () AS rn,
UNNEST(ind) AS un_ind,
UNNEST(tar) AS un_tar
FROM test
),
cte2 AS
(
SELECT rn, un_ind AS x FROM cte1
INTERSECT
SELECT rn, un_tar FROM cte1
ORDER BY rn, x
)
SELECT rn,
ARRAY_AGG(un_ind ORDER BY un_ind),
CASE
WHEN ARRAY_REMOVE(ARRAY_AGG(un_tar ORDER BY un_tar), NULL) = '{}' THEN NULL
ELSE ARRAY_REMOVE(ARRAY_AGG(un_tar ORDER BY un_tar), NULL)
END
FROM
(
SELECT
rn,
un_ind,
CASE
WHEN (cte1.rn, cte1.un_tar) =
(
SELECT rn, x
FROM cte2
WHERE rn = cte1.rn
AND x = cte1.un_tar
) THEN NULL
ELSE un_tar
END
FROM cte1
) AS tab
WHERE un_ind IS NOT NULL
GROUP BY rn;
Result:
rn array_agg array_remove
1 {10} NULL
2 {6} {5}
3 {4,5,6} NULL
4 {5,6} NULL
5 {7,8} {11}
6 {11} {5}
7 {11} {8}
8 {9,10} {6}
Et voilà - the desired result!
Method 3 (Simplified INTARRAY - fiddle):
SELECT
ind,
CASE
WHEN (tar - ind) = '{}' THEN NULL
ELSE (tar - ind)
END
FROM
test;
Same results as above!
A performance analysis (for what it's worth) is here - my simplified INTARRAY appears to be the quickest... ( 1 for question that got me thinking!)...