I have a SQL database that SOMETIMES has duplicate values, but only in one column (phone number). If there is a duplicate, the other attributes in the same row are filled in with NULL. In other cases, the phone number is not duplicated, but still has NULL values in the rows. Ex:
first_name | last_name | phone_number |
---|---|---|
john | smith | 123-456-7890 |
NULL | NULL | 123-456-7890 |
NULL | NULL | 456-789-1011 |
carry | smith | 121-314-1516 |
I'm trying to write a query that eliminates cases where the phone number is duplicated and the other values in the row are NULL, to get:
first_name | last_name | phone_number |
---|---|---|
john | smith | 123-456-7890 |
NULL | NULL | 456-789-1011 |
carry | smith | 121-314-1516 |
Any ideas?
CodePudding user response:
In cases like this you probably want a NOT EXISTS clause. This does a lookup for each row in the table, to see if there are any other records with the same phone number and populated name fields.
select
first_name,
last_name,
phone_number
from
phone_numbers pn
where
not exists (
select 1
from phone_numbers pn2
where pn2.phone_number = pn.phone_number
and pn.first_name is not null
and pn.last_name is not null
)
Although I'm not sure it's perfect. If there is a case where two records have the same phone number and both have NULL names then neither would be returned.
CodePudding user response:
One way, might be to use a subquery to identify the phone_numbers only once.. and then outer join to the records you want without nulls. Something like this:
SELECT *
FROM
(SELECT phone_number AS root_phone_number
FROM table
GROUP BY phone_number
) AS phonenumbers
LEFT OUTER JOIN
(SELECT *
FROM table
WHERE first_name IS NOT NULL
) as notnulls
ON phonenumbers.phone_number = notnulls.phone_number
CodePudding user response:
Here is the fastest way to do it, left join to the items you want to remove and then add a where clause for null results for the join. Every row that meets the join requirements WILL NOT be in the results.
I call this an Exclusionary Left Join.
SELECT *
FROM tableyoudidnotname main
LEFT JOIN tableyoudidnotname sub on
main.phone_number = sub.phone_number
and sub.first_name is null
and sub.last_name is null
WHERE sub.phone_number is null
CodePudding user response:
This will do the trick (if the table is duplicates). If you want to delete the rows:
delete from duplicates d
where first_name is null and last_name is null
and 1 < (select count(*) from duplicates where phone_number = d.phone_number);
If you just want to get the new relation:
select * from duplicates d
where not (first_name is null and last_name is null
and 1 < (select count(*) from duplicates where phone_number = d.phone_number));
CodePudding user response:
I would use cte for it. Here's the code that does it.
with cte as (
select phone_number from phone_numbers
group by phone_number
having count(*) > 1
)
delete phone_numbers
where phone_number in (select phone_number from cte)
and first_name is null and last_name is null