Home > front end >  how to check for duplicate rows of all the columns
how to check for duplicate rows of all the columns

Time:10-22

I want to check for duplicate rows . and see there column values . if there were only few columns in my table - 2 for example - I would have done something like: '''

select col1, col2 ,count(*)
from mytable 
group by col1,col2
having count(*) > 1.

''' but I have dozens of column in my table .... and using the above syntax is tedious to specify all the columns in the table. trying another approach with select distinct ... will not identify for me the content of duplicated rows . I tried somthing like '''

select * , count (*)
from my table
group by *

''' but that doesn't work.

CodePudding user response:

Write a query which will write a query for you.

For example, "john smith" is a duplicate here:

SQL> select * from my_data order by 1;

FULL_NAME  FIRST_NAME           LAST_NAME
---------- -------------------- --------------------
h gonzalez h                    gonzalez
john smith john                 smith
john smith john                 smith
rudy chan  rudy                 chan

Query uses user_tab_columns and aggregates all column names, concatenating them to the rest of a select statement:

SQL>   SELECT    'select '
  2           || LISTAGG (column_name, ', ') WITHIN GROUP (ORDER BY column_id)
  3           || ', count(*) cnt '
  4           || CHR (10)
  5           || '  from '
  6           || table_name
  7           || CHR (10)
  8           || '  group by '
  9           || LISTAGG (column_name, ', ') WITHIN GROUP (ORDER BY column_id)
 10           || CHR (10)
 11           || '  having count(*) > 1;' statement_to_run
 12      FROM user_tab_columns
 13     WHERE table_name = 'MY_DATA'
 14  GROUP BY table_name;

STATEMENT_TO_RUN
--------------------------------------------------------------------------------
select FULL_NAME, FIRST_NAME, LAST_NAME, count(*) cnt
  from MY_DATA
  group by FULL_NAME, FIRST_NAME, LAST_NAME
  having count(*) > 1;

Now, copy/paste the above statement_to_run and get the result:

SQL> select FULL_NAME, FIRST_NAME, LAST_NAME, count(*) cnt
  2   from MY_DATA group by
  3  FULL_NAME, FIRST_NAME, LAST_NAME having count(*) > 1;

FULL_NAME  FIRST_NAME           LAST_NAME                   CNT
---------- -------------------- -------------------- ----------
john smith john                 smith                         2

SQL>

CodePudding user response:

Just write out all the columns.

there are dozens of columns ,about 30 , and there names look like : 'AGtrf-456F_RValue'

Copy-paste.

In SQL*Plus you can use the DESCRIBE command to describe a table and you can copy the column names from the table description.

Or you can list all the columns using:

SELECT '"' || column_name || '",'
FROM   user_tab_columns
WHERE  table_name = 'MY_DATA'
ORDER BY column_id;

And then copy-paste the output into your query into the SELECT and GROUP BY clauses.


Can you generate the query automatically.

Yes, but it usually is not worth it as it takes longer to write a query to generate the query than it does just to list the columns and copy-paste.

If you have lots of column names that require you to use quoted identifiers (i.e. they are mixed-case or use non-standard characters like -) then you can use:

SELECT EMPTY_CLOB()
       || 'SELECT '
       || LISTAGG('"' || column_name || '"', ',') WITHIN GROUP (ORDER BY column_id)
       || ', COUNT(1) FROM MY_DATA GROUP BY '
       || LISTAGG('"' || column_name || '"', ',') WITHIN GROUP (ORDER BY column_id)
       || ' HAVING COUNT(1) > 1;'
FROM   user_tab_columns
WHERE  table_name = 'MY_DATA'
ORDER BY column_id;

Which works unless you have too many columns and LISTAGG exceeds 4000 characters then you would need to use something like:

WITH columns (col, pos) AS (
  SELECT '"' || column_name || '",',
        column_id
  FROM   user_tab_columns
  WHERE  table_name = 'MY_DATA'
  ORDER BY column_id
)
SELECT sql
FROM   (
  SELECT 'SELECT ' AS sql, 0 FROM DUAL
  UNION ALL
  SELECT col, pos FROM columns
  UNION ALL
  SELECT ' COUNT(1) FROM MY_DATA GROUP BY ', 10000 FROM DUAL
  UNION ALL
  SELECT col, 10000   pos FROM columns
  UNION ALL
  SELECT '1 HAVING COUNT(1) > 1', 20000 FROM DUAL
  ORDER BY 2
)

fiddle

  • Related