Consider this data:
CREATE TABLE #FakeTable (
PersonID varchar(255)
,PatientNM varchar(255)
,BirthDTS varchar(255)
,PhoneNBR varchar(255)
,EmailAddress varchar(255)
,StreetAddress varchar(255)
,KeyFieldCNT int
)
INSERT INTO #FakeTable (PersonID, PatientNM, BirthDTS, PhoneNBR, EmailAddress, StreetAddress, KeyFieldCNT)
VALUES('123456', 'Jerry Garcia', NULL, NULL, '[email protected]', NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', NULL, NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', '[email protected]', '123 Fake Street', 9),
('7891011', 'Bob Weir', '19471610', NULL, NULL, '456 Nope Street', 11),
('7891011', 'Bob Weir', NULL, NULL, '[email protected]', '456 Nope Street', 10),
('7891011', 'Bob Weir', '19471610', NULL, '[email protected]', '456 Nope Street', 6)
And we then have:
PersonID PatientNM BirthDTS PhoneNBR EmailAddress StreetAddress KeyFieldCNT
123456 Jerry Garcia NULL NULL [email protected] NULL 11
123456 Jerry Garcia 19421401 999-999-9999 NULL NULL 11
123456 Jerry Garcia 19421401 999-999-9999 [email protected] 123 Fake Street 9
7891011 Bob Weir 19471610 NULL NULL 456 Nope Street 11
7891011 Bob Weir NULL NULL [email protected] 456 Nope Street 10
7891011 Bob Weir 19471610 NULL [email protected] 456 Nope Street 6
I want to use ROW_NUMBER() and partition by the PersonID and order by the KeyFieldCNT, in order to establish a ranking. I would like the top ranking row to return only non-null values for each column, if that column happens to be populated in another lower ranking row.
Using just a normal ROW_NUMBER() and partition/order by, here are the results:
WITH CTE AS (
SELECT *
,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY KeyFieldCNT DESC) AS RowNBR
FROM #FakeTable
)
SELECT *
FROM CTE
WHERE RowNBR = '1'
Results In:
PersonID PatientNM BirthDTS PhoneNBR EmailAddress StreetAddress KeyFieldCNT RowNBR
123456 Jerry Garcia NULL NULL [email protected] NULL 11 1
7891011 Bob Weir 19471610 NULL NULL 456 Nope Street 11 1
My Desired results are these:
PersonID PatientNM BirthDTS PhoneNBR EmailAddress StreetAddress KeyFieldCNT RowNBR
123456 Jerry Garcia 19421401 999-999-9999 [email protected] 123 Fake Street 11 1
7891011 Bob Weir 19471610 NULL [email protected] 456 Nope Street 11 1
Does anyone know how to do this? I have tried using this as an example, but it is not giving me what I need when I extrapolate to more than just one field of interest (SQL - Return first non NULL Value after RANK() or ROW_NUMBER()).
CodePudding user response:
You can try this one:
CREATE TABLE #FakeTable (
PersonID varchar(255)
,PatientNM varchar(255)
,BirthDTS varchar(255)
,PhoneNBR varchar(255)
,EmailAddress varchar(255)
,StreetAddress varchar(255)
,KeyFieldCNT int
)
INSERT INTO #FakeTable (PersonID, PatientNM, BirthDTS, PhoneNBR, EmailAddress, StreetAddress, KeyFieldCNT)
VALUES('123456', 'Jerry Garcia', NULL, NULL, '[email protected]', NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', NULL, NULL, 11),
('123456', 'Jerry Garcia', '19421401', '999-999-9999', '[email protected]', '123 Fake Street', 9),
('7891011', 'Bob Weir', '19471610', NULL, NULL, '456 Nope Street', 11),
('7891011', 'Bob Weir', NULL, NULL, '[email protected]', '456 Nope Street', 10),
('7891011', 'Bob Weir', '19471610', NULL, '[email protected]', '456 Nope Street', 6)
SELECT ft.PersonID,
ft.PatientNM,
MAX(ft.BirthDTS) AS BirthDTS,
MAX(ft.PhoneNBR) AS PhoneNBR,
MAX(ft.EmailAddress) AS EmailAddress,
MAX(ft.StreetAddress) AS StreetAddress,
MAX(ft.KeyFieldCNT) AS KeyFieldCNT
FROM #FakeTable AS ft
GROUP BY PersonID, ft.PatientNM
CodePudding user response:
You have a problem. When filtering with row_number()
, the values for all of the columns will come from the same row. This is actually a feature; it's a good thing, and the main reason I've been slowly replacing a lot of old GROUP BY
code to use row_number()
instead.
But for person ID 123456 you want to see non-NULL data for both PhoneNBR
and EmailAddress
, and also show the highest value (11
) for KeyFieldCNT
.
There is no such row in the table.
Both rows for person 123456 with KeyFieldCount = 11 are missing one field or the other.
I suspect KeyFieldCnt
is maintained by the application to show how many of certain fields are populated, and you want to show the row for each user with the most data. Unfortunately, that row is missing one of the fields you care about, even if it has more of the others.
There are ways around this, but if you need to be able to merge data with multiple possible missing fields you will have to join back to the table again for each possible field... potentially quite slow and lengthy. Instead, this seems like a data problem. The table is not storing all the data in all the places you expect, and the better solution is to find and restore/repair the missing data in the table.
Assuming PatientID
, PatientNM
, and KeyFieldCNT
cannot be NULL, you could try this:
WITH CTE AS (
SELECT *
,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY KeyFieldCNT) AS RowNBR
FROM FakeTable
)
SELECT CTE.PersonID, CTE.PatientNM,
coalesce(CTE.BirthDTS, max(ft.BirthDTS)) BirthDTS,
coalesce(CTE.PhoneNBR, max(ft.PhoneNBR)) PhoneNBR,
coalesce(CTE.EmailAddress, max(ft.EmailAddress)) EmailAddress,
coalesce(CTE.StreetAddress, max(ft.StreetAddress)) StreetAddress,
CTE.KeyFieldCNT
FROM CTE
LEFT JOIN FakeTable ft ON ft.PersonID = CTE.PersonID
WHERE RowNBR = 1
GROUP BY CTE.PersonID, CTE.PatientNM, CTE.BirthDTS, CTE.PhoneNBR,
CTE.EmailAddress, CTE.StreetAddress, CTE.KeyFieldCNT
See it here:
Just be aware: this is mixing values from different rows, which is almost always a bad idea, and there is no meaningful selection for which row has priority to fill the missing data.
A more correct approach involves an additional scan of the table for each field you care about, so you can specify the priority and other criteria for filling in the missing values. But that will be much longer and crazy slow.
CodePudding user response:
I tried selecting each column individually with
SELECT TOP 1 <col> FROM ... WHERE <col> IS NOT NULL ORDER BY ... DESC
and it seems to do the trick, as follows:
;WITH X (PersonID) AS (
SELECT DISTINCT PersonID FROM #FakeTable
)
SELECT X.PersonID
, (SELECT TOP 1 F.PatientNM FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.PatientNM IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS PatientNM
, (SELECT TOP 1 F.BirthDTS FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.BirthDTS IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS BirthDTS
, (SELECT TOP 1 F.PhoneNBR FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.PhoneNBR IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS PhoneNBR
, (SELECT TOP 1 F.EmailAddress FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.EmailAddress IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS EmailAddress
, (SELECT TOP 1 F.StreetAddress FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.StreetAddress IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS StreetAddress
, (SELECT TOP 1 F.KeyFieldCNT FROM #FakeTable F WHERE F.PersonID = X.PersonID
AND F.KeyFieldCNT IS NOT NULL ORDER BY F.KeyFieldCNT DESC)
AS KeyFieldCNT
FROM X
Results:
PersonID | PatientNM | BirthDTS | PhoneNBR | EmailAddress | StreetAddress | KeyFieldCNT |
---|---|---|---|---|---|---|
123456 | Jerry Garcia | 19421401 | 999-999-9999 | [email protected] | 123 Fake Street | 11 |
7891011 | Bob Weir | 19471610 | null | [email protected] | 456 Nope Street | 11 |
DB Fiddle: https://dbfiddle.uk/ErNqLqXw
For completeness I used the same pattern also for the KeyFieldCNT
column, which can be probably be simplified.
Furthermore I don't have RowNBR
, it can't be determined and has no meaning as there isn't a single row that provides all data for a result row.
CodePudding user response:
When you want/need the record with all (or the most) non-NULL values on top, you can do this:
select *,
IIF(PersonId is null,0,1)
IIF(PatientNM is null,0,1)
IIF(BirthDTS is null,0,1)
IIF(PhoneNBR is null,0,1)
IIF(EmailAddress is null,0,1)
IIF(StreetAddress is null,0,1)
IIF(KeyFieldCNT is null,0,1) ColumnsWithData
from #FakeTable
ORDER BY
IIF(PersonId is null,0,1)
IIF(PatientNM is null,0,1)
IIF(BirthDTS is null,0,1)
IIF(PhoneNBR is null,0,1)
IIF(EmailAddress is null,0,1)
IIF(StreetAddress is null,0,1)
IIF(KeyFieldCNT is null,0,1) DESC
;
see: DBFIDDLE