Home > Enterprise >  Finding duplicate values in multiple colums in a SQL table and count for chars
Finding duplicate values in multiple colums in a SQL table and count for chars

Time:12-29

referring to this question:

Finding duplicate values in multiple colums in a SQL table and count

I have the following table structure:

id name1 name2 name3  ...
 1 Hans  Peter Frank
 2 Hans  Frank Peter
 3 Hans  Peter Frank
 4 Paul  Peter Hans
 .
 .
 .

I use the following command to display the duplicates and the counts:

SELECT COUNT(name1), name1, name2, name3 
FROM table 
GROUP BY name1, name2, name3 
HAVING (COUNT(name1) > 1) AND (COUNT(name2) > 1) AND (COUNT(name3) > 1)

This command gives me a count of 2. I would like to know how the second line could also be counted as a dublicate.

Unfortunately, the solution to the original question (Finding duplicate values in multiple colums in a SQL table and count) does not work for char

CodePudding user response:

First normalize the table with UNION ALL in a CTE to get each of the 3 names in a separate row.
Then with ROW_NUMBER() window function you can rank alphabetically the 3 names so that you can group by them:

WITH cte(id, name) AS (
  SELECT id, name1 FROM tablename
  UNION ALL
  SELECT id, name2 FROM tablename
  UNION ALL
  SELECT id, name3 FROM tablename
)
SELECT COUNT(*) count, name1, name2, name3
FROM (
  SELECT id,
         MAX(CASE WHEN rn = 1 THEN name END) name1,
         MAX(CASE WHEN rn = 2 THEN name END) name2,
         MAX(CASE WHEN rn = 3 THEN name END) name3
  FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY name) rn
    FROM cte
  )
  GROUP BY id
)
GROUP BY name1, name2, name3
HAVING COUNT(*) > 1;

See the demo.

CodePudding user response:

Not particularly pretty but a different approach to pivot the columns and then aggregate them as a string and counting duplicates. Slightly fiddly as unfortunately in SQL Lite the group_concat function can't specify any ordering within the group, forcing another level of nesting and a row_number. I guess that's part of being SQL lite!

You could remove the Seq and filtering criteria if you wanted to show all the duplicate variations.

with cte as (
    select Duplicates, name1, name2, name3, 
      Row_Number() over(partition by Duplicates order by name1,name2,name3) Seq
    from (
      select count(*) over(partition by allnames) Duplicates, name1, name2, name3
      from t
      left join (
      select Id, group_concat(Dname) allNames
      from (
          select Id, Dname, row_number() over (partition by Id order by Dname) seq
              from (
                  select id, name1 Dname from t union all
                  select id, name2 from t union all
                  select id, name3 from t
              )x
       )x
     group by Id
     order by seq
     )d on d.id=t.id
   )d
)
select Duplicates, name1, name2, name3
from cte
where duplicates>1 and seq=1

Demo Fiddle

  • Related