Creating query to reference prior row over grouped list of users-CodePudding

We have a collection of users with duplicates, and I'm writing a process to merge them. Basically selecting out all users with matching names and DOB's, then I need a list of user id's to merge them together. Here's an example:

CREATE TABLE #tmpUsers (UserID Integer NOT NULL PRIMARY KEY, FullName NVARCHAR(50), Birthdate DATE);

INSERT INTO #tmpUsers (UserID, FullName, Birthdate)
VALUES
(120,'John Michael','1985-03-02'),
(45,'John Michael','1985-03-02'),
(60,'John Michael','1985-03-02'),
(33,'John Michael','1985-03-02'),
(12,'Tim Smith','1973-01-02'),
(16,'Tim Smith','1973-01-02'),
(29,'Jane Thomas','1990-06-20'),
(43,'Jane Thomas','1990-06-20'),
(8,'Jane Thomas','1990-06-20');

The process I'm building needs to have a new table ordered by the Fullname and DOB, but have the current and prior ID so it can merge together, like this:

Name	DOB	Merge From	Merge To
Jane Thomas	1990-06-20	8	29
Jane Thomas	1990-06-20	29	43
John Michael	1985-03-02	33	45
John Michael	1985-03-02	45	60
John Michael	1985-03-02	60	120
Tim Smith	1973-01-02	12	16

The process basically merges or collapses the oldest values into the newest one, so in the end we will only have one User for each. I'm just unable to find any good way to do this, though I'm sure there's a simple TSQL method. I hoped someone had advise on how to build it.

In the end after my process runs it'll have three users with ID's 16, 43, 120. The others will either be removed or deactivated, but just getting the query to start the process is where I'm hung.

Thanks.

CodePudding user response：

This will do it:

SELECT * 
FROM (
    SELECT FullName as Name, BirthDate as DOB, UserID as [Merge From],
       LEAD(UserID) OVER(PARTITION BY fullname, birthdate 
                         ORDER BY fullname, birthdate, userid) as [Merge To]
    from #tmpUsers
) t 
WHERE [Merge To] IS NOT NULL
ORDER BY Name, DOB, [Merge From];

See it work here:

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c2557bc038cab44ab000a1b35ab1563b

CodePudding user response：

This is ubiquitously solved with row_number to select the max value per group:

with u as (
    select UserId, FullName, BirthDate,
        Row_Number() over(partition by FullName order by UserId desc) keepMe
    from #tmpUsers
)
select UserId, FullName, BirthDate
from u
where KeepMe=1

CodePudding user response：

Though my suggestion follows a different approach to solve the underlying problem, why not run this simple query,

SELECT
  MIN(UserID) AS MergeFrom,
  MAX(UserID) AS MergeTo,
  FullName,
  BirthDate
FROM #tmpusers
GROUP BY 
  FullName,
  BirthDate
HAVING MIN(UserID)<>MAX(UserID)

shift users as indicated and do this in a loop until the query returns an empty result set?