referring to this question:
Finding duplicate values in multiple colums in a SQL table and count
I have the following table structure:
id name1 name2 name3 ...
1 Hans Peter Frank
2 Hans Frank Peter
3 Hans Peter Frank
4 Paul Peter Hans
.
.
.
I use the following command to display the duplicates and the counts:
SELECT COUNT(name1), name1, name2, name3
FROM table
GROUP BY name1, name2, name3
HAVING (COUNT(name1) > 1) AND (COUNT(name2) > 1) AND (COUNT(name3) > 1)
This command gives me a count of 2. I would like to know how the second line could also be counted as a dublicate.
Unfortunately, the solution to the original question (Finding duplicate values in multiple colums in a SQL table and count) does not work for char
CodePudding user response:
First normalize the table with UNION ALL
in a CTE to get each of the 3 names in a separate row.
Then with ROW_NUMBER()
window function you can rank alphabetically the 3 names so that you can group by them:
WITH cte(id, name) AS (
SELECT id, name1 FROM tablename
UNION ALL
SELECT id, name2 FROM tablename
UNION ALL
SELECT id, name3 FROM tablename
)
SELECT COUNT(*) count, name1, name2, name3
FROM (
SELECT id,
MAX(CASE WHEN rn = 1 THEN name END) name1,
MAX(CASE WHEN rn = 2 THEN name END) name2,
MAX(CASE WHEN rn = 3 THEN name END) name3
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY name) rn
FROM cte
)
GROUP BY id
)
GROUP BY name1, name2, name3
HAVING COUNT(*) > 1;
See the demo.
CodePudding user response:
Not particularly pretty but a different approach to pivot the columns and then aggregate them as a string and counting duplicates. Slightly fiddly as unfortunately in SQL Lite the group_concat function can't specify any ordering within the group, forcing another level of nesting and a row_number. I guess that's part of being SQL lite!
You could remove the Seq and filtering criteria if you wanted to show all the duplicate variations.
with cte as (
select Duplicates, name1, name2, name3,
Row_Number() over(partition by Duplicates order by name1,name2,name3) Seq
from (
select count(*) over(partition by allnames) Duplicates, name1, name2, name3
from t
left join (
select Id, group_concat(Dname) allNames
from (
select Id, Dname, row_number() over (partition by Id order by Dname) seq
from (
select id, name1 Dname from t union all
select id, name2 from t union all
select id, name3 from t
)x
)x
group by Id
order by seq
)d on d.id=t.id
)d
)
select Duplicates, name1, name2, name3
from cte
where duplicates>1 and seq=1