Home > Software design >  PostgresSQL: Getting number of correspondences between two arrays
PostgresSQL: Getting number of correspondences between two arrays

Time:08-30

I want to compare and get number of corresposndeces between two arrays in PostgresSQL.

For example: given the table

id array
1 a, b, d
2 a, e
3 a, b, d, e
4 b, c, d

I would like to obtain a comparison table:

id_1 id_2 number_of_matches
1 2 1
1 3 3
1 4 2
2 3 2
2 4 0
3 4 2

There is no repeated values in each array, and the order of items inside arrays does not matter.

I found here about the overlap operator (&&) but it only returns a boolean if there is or not any correspondence between two arrays, but not the number of correspondences.

CREATE TABLE data (id integer, val text[]);
INSERT INTO data VALUES (1, '{a,b,d}');
INSERT INTO data VALUES (2, '{a,e}');
INSERT INTO data VALUES (3, '{a,b,d,e}');
INSERT INTO data VALUES (4, '{b,c,d}');
SELECT a.id id_1, b.id id_2, a.val && b.val match 
FROM data a, data b
WHERE a.id < b.id;

returns

id_1 id_2 match
1 2 t
1 3 t
1 4 t
2 3 t
2 4 f
3 4 t

CodePudding user response:

demo:db<>fiddle

WITH normalized AS (                      -- 2
    SELECT
        id,
        unnest(val) as array_element      -- 1
    FROM data
)
SELECT
    n1.id,
    n2.id,
    COUNT(*)
FROM normalized n1
JOIN normalized n2                        -- 3
    ON (n1.array_element = n2.array_element AND n1.id <> n2.id)
GROUP BY n1.id, n2.id                     -- 4
ORDER BY n1.id, n2.id
  1. Normalize your data (note: You should think about normalizing your data directly in the table; means: no arrays) by creating one row per array element
  2. Use a CTE to "store" your normalized result
  3. Self-join your normalized result (but not the same ids) at the array_element
  4. GROUP and COUNT

CodePudding user response:

https://www.db-fiddle.com/f/sobBPcfXY7h8j7y6tKXfDj/0

get the intersection of the 2 arrays an count the elements of it:

 SELECT 
   data.id,
   data2.id as id2,
   data.val,
   data2.val val2,
   ARRAY( SELECT * FROM UNNEST( data.val ) WHERE UNNEST = ANY( data2.val )) as intersection,
   cardinality(ARRAY( SELECT * FROM UNNEST( data.val ) WHERE UNNEST = ANY( data2.val ))) as number_of_matches
   from data,data as data2
 where data.id <> data2.id
  • Related