Home > Software engineering >  How do I create a unique indicator in SQL based on subset of columns?
How do I create a unique indicator in SQL based on subset of columns?

Time:07-20

I have a list of people that could have more than 1 row in Oracle (SQL). Such as below:

NAME SSN ADDRESS COUNTY PHONE
MARY MOUSE 456-78-9012 123 MAIN ST LA MESA 555-456-1234
MARY MOUSE 456-78-9012 123 MAIN ST LA MESA 555-456-1235
MARY MOUSE 456-78-9012 456 MAIN ST LA MESA 555-456-1234
MARY MOUSE 456-78-9012 789 MAIN ST SANTA CLARA 555-456-1234

I am trying to put an indicator column labeled unique based on name, SSN, and county. Only 1 of the rows can have a Y for unique and any subsequent rows with that name, ssn, county combo would be N. Like this:

NAME SSN ADDRESS COUNTY PHONE UNIQUE
MARY MOUSE 456-78-9012 123 MAIN ST LA MESA 555-456-1234 Y
MARY MOUSE 456-78-9012 123 MAIN ST LA MESA 555-456-1235 N
MARY MOUSE 456-78-9012 456 MAIN ST LA MESA 555-456-1234 N
MARY MOUSE 456-78-9012 789 MAIN ST SANTA CLARA 555-456-1234 Y

Any ideas how I would go about doing this in Oracle(SQL)?

CodePudding user response:

You can use ROW_NUMBER() Over (Partition By NAME,SSN,COUNTY Order By ADDRESS,PHONE) then check if the row number value is equal to one as the following:

Select NAME,SSN,ADDRESS,COUNTY,PHONE,
(Case When RN=1 Then 'Y' Else 'N' End) AS UNI
From
(
  Select NAME,SSN,ADDRESS,COUNTY,PHONE,
  ROW_NUMBER() Over (Partition By NAME,SSN,COUNTY Order By ADDRESS,PHONE) RN
  From MyData
) D

Or

Select NAME,SSN,ADDRESS,COUNTY,PHONE,
(
  Case When ROW_NUMBER() Over (Partition By NAME,SSN,COUNTY Order By ADDRESS,PHONE)=1 
  Then 'Y' Else 'N' End
) AS UNI
From MyData

See a demo from db<>fiddle.

CodePudding user response:

I personally prefer lag/lead to generate the first row indicator. These functions accept three arguments:

  • an expression to get at the offset
  • an offset in the number of rows to get an expression from
  • a default value to provide as a result when the function goes beyond the partition boundary.

So the first argument should be the value to set for all the rows except the first one, the third argument should be the value to set for the first row (where lag would leave the partition boundary). This code is more compact as it doesn't involve case expressions around an analytic function (which is quite hard to read).

with a(NAME, SSN, ADDRESS, COUNTY, PHONE) as (
select 'MARY MOUSE', '456-78-9012', '123 MAIN ST', 'LA MESA', '555-456-1234' from dual union all
select 'MARY MOUSE', '456-78-9012', '123 MAIN ST', 'LA MESA', '555-456-1235' from dual union all
select 'MARY MOUSE', '456-78-9012', '456 MAIN ST', 'LA MESA', '555-456-1234' from dual union all
select 'MARY MOUSE', '456-78-9012', '789 MAIN ST', 'SANTA CLARA', '555-456-1234' from dual
)
select
  a.*,
  lag('N', 1, 'Y')
    over(partition by name, ssn, county order by null) as unique_
from a
NAME       | SSN         | ADDRESS     | COUNTY      | PHONE        | UNIQUE_
:--------- | :---------- | :---------- | :---------- | :----------- | :------
MARY MOUSE | 456-78-9012 | 123 MAIN ST | LA MESA     | 555-456-1234 | Y      
MARY MOUSE | 456-78-9012 | 123 MAIN ST | LA MESA     | 555-456-1235 | N      
MARY MOUSE | 456-78-9012 | 456 MAIN ST | LA MESA     | 555-456-1234 | N      
MARY MOUSE | 456-78-9012 | 789 MAIN ST | SANTA CLARA | 555-456-1234 | Y      

db<>fiddle here

  • Related