Home > Software engineering >  SQL: Eliminating duplicates with specific conditions
SQL: Eliminating duplicates with specific conditions

Time:05-04

I have a SQL database that SOMETIMES has duplicate values, but only in one column (phone number). If there is a duplicate, the other attributes in the same row are filled in with NULL. In other cases, the phone number is not duplicated, but still has NULL values in the rows. Ex:

first_name last_name phone_number
john smith 123-456-7890
NULL NULL 123-456-7890
NULL NULL 456-789-1011
carry smith 121-314-1516

I'm trying to write a query that eliminates cases where the phone number is duplicated and the other values in the row are NULL, to get:

first_name last_name phone_number
john smith 123-456-7890
NULL NULL 456-789-1011
carry smith 121-314-1516

Any ideas?

CodePudding user response:

In cases like this you probably want a NOT EXISTS clause. This does a lookup for each row in the table, to see if there are any other records with the same phone number and populated name fields.

select
    first_name,
    last_name,
    phone_number
from
    phone_numbers pn
where
    not exists (
        select 1
        from phone_numbers pn2
        where pn2.phone_number = pn.phone_number
        and pn.first_name is not null
        and pn.last_name is not null
    )

Although I'm not sure it's perfect. If there is a case where two records have the same phone number and both have NULL names then neither would be returned.

CodePudding user response:

One way, might be to use a subquery to identify the phone_numbers only once.. and then outer join to the records you want without nulls. Something like this:

SELECT *
FROM 
 (SELECT phone_number AS root_phone_number
  FROM table
  GROUP BY phone_number
 ) AS phonenumbers
LEFT OUTER JOIN
 (SELECT *
  FROM table
  WHERE first_name IS NOT NULL
 ) as notnulls
ON phonenumbers.phone_number = notnulls.phone_number

CodePudding user response:

Here is the fastest way to do it, left join to the items you want to remove and then add a where clause for null results for the join. Every row that meets the join requirements WILL NOT be in the results.

I call this an Exclusionary Left Join.

SELECT *
FROM tableyoudidnotname main
LEFT JOIN tableyoudidnotname sub on 
               main.phone_number = sub.phone_number 
           and sub.first_name is null 
           and sub.last_name is null
WHERE sub.phone_number is null

CodePudding user response:

This will do the trick (if the table is duplicates). If you want to delete the rows:

delete from duplicates d 
where first_name is null and last_name is null 
and 1 < (select count(*) from duplicates where phone_number = d.phone_number);

If you just want to get the new relation:

select * from duplicates d 
where not (first_name is null and last_name is null 
and 1 < (select count(*) from duplicates where phone_number = d.phone_number));

CodePudding user response:

I would use cte for it. Here's the code that does it.

    with cte as (
    select phone_number from phone_numbers 
    group by phone_number
    having count(*) > 1
)

delete phone_numbers
where phone_number in (select phone_number from cte)
and first_name is null and last_name is null
  • Related