Return non-null values for every field in ROW

Consider this data:

CREATE TABLE #FakeTable (

PersonID varchar(255)
,PatientNM varchar(255)
,BirthDTS varchar(255)
,PhoneNBR varchar(255)
,EmailAddress varchar(255)
,StreetAddress varchar(255)
,KeyFieldCNT int

)

INSERT INTO #FakeTable (PersonID, PatientNM, BirthDTS, PhoneNBR, EmailAddress, StreetAddress, KeyFieldCNT)
VALUES('123456', 'Jerry Garcia', NULL, NULL, '[email protected]', NULL, 11),
        ('123456', 'Jerry Garcia', '19421401', '999-999-9999', NULL, NULL, 11),
        ('123456', 'Jerry Garcia', '19421401', '999-999-9999', '[email protected]', '123 Fake Street', 9),
        ('7891011', 'Bob Weir', '19471610', NULL, NULL, '456 Nope Street', 11),
        ('7891011', 'Bob Weir', NULL, NULL, '[email protected]', '456 Nope Street', 10),
        ('7891011', 'Bob Weir', '19471610', NULL, '[email protected]', '456 Nope Street', 6)

And we then have:

PersonID  PatientNM     BirthDTS       PhoneNBR      EmailAddress        StreetAddress    KeyFieldCNT
123456    Jerry Garcia   NULL          NULL          [email protected]    NULL               11
123456    Jerry Garcia   19421401      999-999-9999  NULL                NULL               11
123456    Jerry Garcia   19421401      999-999-9999  [email protected]    123 Fake Street    9
7891011   Bob Weir       19471610      NULL          NULL                456 Nope Street    11
7891011   Bob Weir       NULL          NULL          [email protected]       456 Nope Street    10
7891011   Bob Weir       19471610      NULL          [email protected]       456 Nope Street    6

I want to use ROW_NUMBER() and partition by the PersonID and order by the KeyFieldCNT, in order to establish a ranking. I would like the top ranking row to return only non-null values for each column, if that column happens to be populated in another lower ranking row.

Using just a normal ROW_NUMBER() and partition/order by, here are the results:

WITH CTE AS (

SELECT *
,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY KeyFieldCNT DESC) AS RowNBR
FROM #FakeTable

)

SELECT *
FROM CTE
WHERE RowNBR = '1'

Results In:

PersonID  PatientNM     BirthDTS  PhoneNBR       EmailAddress      StreetAddress      KeyFieldCNT   RowNBR
123456    Jerry Garcia  NULL      NULL           [email protected]    NULL               11          1
7891011   Bob Weir      19471610  NULL           NULL                456 Nope Street    11          1

My Desired results are these:

PersonID  PatientNM     BirthDTS  PhoneNBR       EmailAddress      StreetAddress      KeyFieldCNT   RowNBR
123456    Jerry Garcia  19421401  999-999-9999   [email protected]  123 Fake Street    11            1
7891011   Bob Weir      19471610  NULL           [email protected]     456 Nope Street    11            1

Does anyone know how to do this? I have tried using this as an example, but it is not giving me what I need when I extrapolate to more than just one field of interest (SQL - Return first non NULL Value after RANK() or ROW_NUMBER()).

CodePudding user response：

You can try this one:

CREATE TABLE #FakeTable (

PersonID varchar(255)
,PatientNM varchar(255)
,BirthDTS varchar(255)
,PhoneNBR varchar(255)
,EmailAddress varchar(255)
,StreetAddress varchar(255)
,KeyFieldCNT int

)

INSERT INTO #FakeTable (PersonID, PatientNM, BirthDTS, PhoneNBR, EmailAddress, StreetAddress, KeyFieldCNT)
VALUES('123456', 'Jerry Garcia', NULL, NULL, '[email protected]', NULL, 11),
        ('123456', 'Jerry Garcia', '19421401', '999-999-9999', NULL, NULL, 11),
        ('123456', 'Jerry Garcia', '19421401', '999-999-9999', '[email protected]', '123 Fake Street', 9),
        ('7891011', 'Bob Weir', '19471610', NULL, NULL, '456 Nope Street', 11),
        ('7891011', 'Bob Weir', NULL, NULL, '[email protected]', '456 Nope Street', 10),
        ('7891011', 'Bob Weir', '19471610', NULL, '[email protected]', '456 Nope Street', 6)

    SELECT ft.PersonID,
           ft.PatientNM,
           MAX(ft.BirthDTS) AS BirthDTS,
           MAX(ft.PhoneNBR) AS PhoneNBR,
           MAX(ft.EmailAddress) AS EmailAddress,
           MAX(ft.StreetAddress) AS StreetAddress,
           MAX(ft.KeyFieldCNT) AS  KeyFieldCNT
    FROM #FakeTable AS ft
    GROUP BY PersonID, ft.PatientNM

CodePudding user response：

You have a problem. When filtering with row_number(), the values for all of the columns will come from the same row. This is actually a feature; it's a good thing, and the main reason I've been slowly replacing a lot of old GROUP BY code to use row_number() instead.

But for person ID 123456 you want to see non-NULL data for both PhoneNBR and EmailAddress, and also show the highest value (11) for KeyFieldCNT.

There is no such row in the table.

Both rows for person 123456 with KeyFieldCount = 11 are missing one field or the other.

I suspect KeyFieldCnt is maintained by the application to show how many of certain fields are populated, and you want to show the row for each user with the most data. Unfortunately, that row is missing one of the fields you care about, even if it has more of the others.

There are ways around this, but if you need to be able to merge data with multiple possible missing fields you will have to join back to the table again for each possible field... potentially quite slow and lengthy. Instead, this seems like a data problem. The table is not storing all the data in all the places you expect, and the better solution is to find and restore/repair the missing data in the table.

Assuming PatientID, PatientNM, and KeyFieldCNT cannot be NULL, you could try this:

WITH CTE AS (
    SELECT *
        ,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY KeyFieldCNT) AS RowNBR
    FROM FakeTable
)

SELECT CTE.PersonID, CTE.PatientNM, 
      coalesce(CTE.BirthDTS, max(ft.BirthDTS)) BirthDTS, 
      coalesce(CTE.PhoneNBR, max(ft.PhoneNBR)) PhoneNBR,
      coalesce(CTE.EmailAddress, max(ft.EmailAddress)) EmailAddress,
      coalesce(CTE.StreetAddress, max(ft.StreetAddress)) StreetAddress,
      CTE.KeyFieldCNT
FROM CTE
LEFT JOIN FakeTable ft ON ft.PersonID = CTE.PersonID
WHERE RowNBR = 1
GROUP BY CTE.PersonID, CTE.PatientNM, CTE.BirthDTS, CTE.PhoneNBR,
         CTE.EmailAddress, CTE.StreetAddress, CTE.KeyFieldCNT

See it here:

https://dbfiddle.uk/Nqz61kul

Just be aware: this is mixing values from different rows, which is almost always a bad idea, and there is no meaningful selection for which row has priority to fill the missing data.

A more correct approach involves an additional scan of the table for each field you care about, so you can specify the priority and other criteria for filling in the missing values. But that will be much longer and crazy slow.

CodePudding user response：

I tried selecting each column individually with

SELECT TOP 1 <col> FROM ... WHERE <col> IS NOT NULL ORDER BY ... DESC

and it seems to do the trick, as follows:

;WITH X (PersonID) AS (
  SELECT DISTINCT PersonID FROM #FakeTable
)
SELECT X.PersonID
     , (SELECT TOP 1 F.PatientNM     FROM #FakeTable F WHERE F.PersonID = X.PersonID
               AND   F.PatientNM     IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
                AS     PatientNM
     , (SELECT TOP 1 F.BirthDTS      FROM #FakeTable F WHERE F.PersonID = X.PersonID
               AND   F.BirthDTS      IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
                AS     BirthDTS
     , (SELECT TOP 1 F.PhoneNBR      FROM #FakeTable F WHERE F.PersonID = X.PersonID
               AND   F.PhoneNBR      IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
                AS     PhoneNBR
     , (SELECT TOP 1 F.EmailAddress  FROM #FakeTable F WHERE F.PersonID = X.PersonID
               AND   F.EmailAddress  IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
                AS     EmailAddress
     , (SELECT TOP 1 F.StreetAddress FROM #FakeTable F WHERE F.PersonID = X.PersonID
               AND   F.StreetAddress IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
                AS     StreetAddress
     , (SELECT TOP 1 F.KeyFieldCNT   FROM #FakeTable F WHERE F.PersonID = X.PersonID
               AND   F.KeyFieldCNT   IS NOT NULL    ORDER BY F.KeyFieldCNT DESC)
                AS     KeyFieldCNT
FROM X

Results:

PersonID	PatientNM	BirthDTS	PhoneNBR	EmailAddress	StreetAddress	KeyFieldCNT
123456	Jerry Garcia	19421401	999-999-9999	[email protected]	123 Fake Street	11
7891011	Bob Weir	19471610	null	[email protected]	456 Nope Street	11

DB Fiddle: https://dbfiddle.uk/ErNqLqXw

For completeness I used the same pattern also for the KeyFieldCNT column, which can be probably be simplified.

Furthermore I don't have RowNBR, it can't be determined and has no meaning as there isn't a single row that provides all data for a result row.

CodePudding user response：

When you want/need the record with all (or the most) non-NULL values on top, you can do this:

select *,
  IIF(PersonId is null,0,1)  
  IIF(PatientNM is null,0,1)  
  IIF(BirthDTS is null,0,1)  
  IIF(PhoneNBR is null,0,1)  
  IIF(EmailAddress is null,0,1)  
  IIF(StreetAddress is null,0,1)  
  IIF(KeyFieldCNT is null,0,1) ColumnsWithData
from #FakeTable
ORDER BY 
  IIF(PersonId is null,0,1)  
  IIF(PatientNM is null,0,1)  
  IIF(BirthDTS is null,0,1)  
  IIF(PhoneNBR is null,0,1)  
  IIF(EmailAddress is null,0,1)  
  IIF(StreetAddress is null,0,1)  
  IIF(KeyFieldCNT is null,0,1)     DESC
  ;

see: DBFIDDLE