Given the following table, I need to merge the fields in different "id" only if they are the same type (person or dog), and always as the value of every field of an "id" is contained in the values of other "ids".
id | being | feature | values |
---|---|---|---|
1 | person | name | John;Paul |
1 | person | surname | Smith |
2 | dog | name | Ringo |
3 | dog | name | Snowy |
4 | person | name | John |
4 | person | surname | |
5 | person | name | John;Ringo |
5 | person | surname | Smith |
In this example, the merge results should be as follows:
1 and 4 (Since 4's name is present in 1's name and 4's surname is empty) 1 and 5 cannot be merged (the name field show different values) 4 and 5 can be merged 2 and 3 (dogs) cannot be merged. They have only the field "name" and they do not share values. 2 and 3 cannot be merged with 1, 4, 5 since they have different values in "being".
id | being | feature | values |
---|---|---|---|
1 | person | name | John;Paul |
1 | person | surname | Smith |
2 | dog | name | Ringo |
3 | dog | name | Snowy |
5 | person | name | John;Ringo |
5 | person | surname | Smith |
I have tried this:
UPDATE table a
SET values = (SELECT array_to_string(array_agg(distinct values),';') AS values FROM table b
WHERE a.being= b.being
AND a.feature= b.feature
AND a.id<> b.id
AND a.values LIKE '%'||a.values||'%'
)
WHERE (select count (*) FROM (SELECT DISTINCT c.being, c.id from table c where a.being=c.being) as temp) >1
;
This doesn't work well because it will merge, for example, 1 and 5. Besides, it duplicates values when merging that field.
Cloud you please help me? I'd appreciate your help very much!
CodePudding user response:
One option is to aggregate names with surnames on "id" and "being". Once you get a single string per "id", a self join may find when a full name is completely included inside another (where the "being" is same for both "id"s), then you just select the smallest fullname, candidate for deletion:
WITH cte AS (
SELECT id,
being,
STRING_AGG(values, ';') AS fullname
FROM tab
GROUP BY id,
being
)
DELETE FROM tab
WHERE id IN (SELECT t2.id
FROM cte t1
INNER JOIN cte t2
ON t1.being = t2.being
AND t1.id > t2.id
AND t1.fullname LIKE CONCAT('%',t2.fullname,'%'));
Check the demo here.