I have a list of people that could have more than 1 row in Oracle (SQL). Such as below:
NAME | SSN | ADDRESS | COUNTY | PHONE |
---|---|---|---|---|
MARY MOUSE | 456-78-9012 | 123 MAIN ST | LA MESA | 555-456-1234 |
MARY MOUSE | 456-78-9012 | 123 MAIN ST | LA MESA | 555-456-1235 |
MARY MOUSE | 456-78-9012 | 456 MAIN ST | LA MESA | 555-456-1234 |
MARY MOUSE | 456-78-9012 | 789 MAIN ST | SANTA CLARA | 555-456-1234 |
I am trying to put an indicator column labeled unique based on name, SSN, and county. Only 1 of the rows can have a Y for unique and any subsequent rows with that name, ssn, county combo would be N. Like this:
NAME | SSN | ADDRESS | COUNTY | PHONE | UNIQUE |
---|---|---|---|---|---|
MARY MOUSE | 456-78-9012 | 123 MAIN ST | LA MESA | 555-456-1234 | Y |
MARY MOUSE | 456-78-9012 | 123 MAIN ST | LA MESA | 555-456-1235 | N |
MARY MOUSE | 456-78-9012 | 456 MAIN ST | LA MESA | 555-456-1234 | N |
MARY MOUSE | 456-78-9012 | 789 MAIN ST | SANTA CLARA | 555-456-1234 | Y |
Any ideas how I would go about doing this in Oracle(SQL)?
CodePudding user response:
You can use ROW_NUMBER() Over (Partition By NAME,SSN,COUNTY Order By ADDRESS,PHONE)
then check if the row number value is equal to one as the following:
Select NAME,SSN,ADDRESS,COUNTY,PHONE,
(Case When RN=1 Then 'Y' Else 'N' End) AS UNI
From
(
Select NAME,SSN,ADDRESS,COUNTY,PHONE,
ROW_NUMBER() Over (Partition By NAME,SSN,COUNTY Order By ADDRESS,PHONE) RN
From MyData
) D
Or
Select NAME,SSN,ADDRESS,COUNTY,PHONE,
(
Case When ROW_NUMBER() Over (Partition By NAME,SSN,COUNTY Order By ADDRESS,PHONE)=1
Then 'Y' Else 'N' End
) AS UNI
From MyData
See a demo from db<>fiddle.
CodePudding user response:
I personally prefer lag/lead
to generate the first row indicator. These functions accept three arguments:
- an expression to get at the offset
- an offset in the number of rows to get an expression from
- a default value to provide as a result when the function goes beyond the partition boundary.
So the first argument should be the value to set for all the rows except the first one, the third argument should be the value to set for the first row (where lag
would leave the partition boundary). This code is more compact as it doesn't involve case
expressions around an analytic function (which is quite hard to read).
with a(NAME, SSN, ADDRESS, COUNTY, PHONE) as ( select 'MARY MOUSE', '456-78-9012', '123 MAIN ST', 'LA MESA', '555-456-1234' from dual union all select 'MARY MOUSE', '456-78-9012', '123 MAIN ST', 'LA MESA', '555-456-1235' from dual union all select 'MARY MOUSE', '456-78-9012', '456 MAIN ST', 'LA MESA', '555-456-1234' from dual union all select 'MARY MOUSE', '456-78-9012', '789 MAIN ST', 'SANTA CLARA', '555-456-1234' from dual ) select a.*, lag('N', 1, 'Y') over(partition by name, ssn, county order by null) as unique_ from a
NAME | SSN | ADDRESS | COUNTY | PHONE | UNIQUE_ :--------- | :---------- | :---------- | :---------- | :----------- | :------ MARY MOUSE | 456-78-9012 | 123 MAIN ST | LA MESA | 555-456-1234 | Y MARY MOUSE | 456-78-9012 | 123 MAIN ST | LA MESA | 555-456-1235 | N MARY MOUSE | 456-78-9012 | 456 MAIN ST | LA MESA | 555-456-1234 | N MARY MOUSE | 456-78-9012 | 789 MAIN ST | SANTA CLARA | 555-456-1234 | Y
db<>fiddle here