Home > OS >  Combine First and Last Names of people based on criteria
Combine First and Last Names of people based on criteria

Time:09-22

So I have an interesting scenario. There are 2 events, for which one of the Married EventType I'd like to join associated married couples of the house, grouped by an identifier (HouseID; 2 people in the same house), the EventType, and the EventDate. For the case such as an EventType of Birthday, the 2 residents of the house would not be combined in the same row. In the case of an EventType of Wedding, combine the name result of 2 rows (2 rules below based on LastName) into 1. It's possible to have only 1 person for a HouseID for the EventType of Wedding or Birthday, so therefore they would list as an individual row. The combining rules for EventType of Wedding would be as follows:

  1. If the last names are the same, the FinalName column result would be Will and Mary Stanton
  2. If the last names are different, the FinalName column result would be Stephen Jacobs and Janetsy Lilly.

The combining of the names would be contingent on the HouseID, EventType, and EventDate being the same, specific only to Wedding. This is because it's possible for 2 people to live in a house that are married, but not to each other which we base off the EventDate; we assume the EventDate is the indicator that they are married to each other. The example table input is as follows:

DECLARE @t TABLE (
    HouseID INT,
    FirstName NVARCHAR(64),
    LastName NVARCHAR(64),
    EventType NVARCHAR(64),
    EventDate DATE
);

INSERT INTO @t (HouseID, FirstName, LastName, EventType, EventDate)
VALUES
    (1, 'Will', 'Stanton', 'Birthday', '1974-01-05'),
    (1, 'Mary', 'Stanton', 'Birthday', '1980-05-22'),
    (2, 'Jason', 'Stockmore', 'Birthday', '1987-12-07'),
    (3, 'Mark', 'Mellony', 'Wedding', '2021-04-04'),
    (3, 'Stacy', 'Mellony', 'Wedding', '2021-04-04'),
    (4, 'Stephen', 'Johnson', 'Wedding', '2012-01-30'),
    (4, 'Janetsy', 'Johnson', 'Wedding', '2012-01-30'),
    (5, 'George', 'Jackson', 'Wedding', '2009-11-15'),
    (5, 'Sally', 'Mistmoore', 'Wedding', '2009-11-15'),
    (6, 'Sandy', 'Katz', 'Wedding', '2010-03-19'),
    (6, 'Jeff', 'Trilov', 'Wedding', '2016-09-09'),
    (7, 'Sandra', 'Kirchbaum', 'Wedding', '2011-05-22'),
    (8, 'Jessica', 'Bower', 'Birthday', '1996-02-26'),
    (8, 'Frank', 'Fjorn', 'Birthday', '1969-07-19');

The ideal result based on the input table would resemble:

| HouseID | FinalName                          | EventType | EventDate  |
| ------- | ---------------------------------- | --------- | ---------- |
| 1       | Mary Stanton                       | Birthday  | 1974-01-05 |
| 1       | Will Stanton                       | Birthday  | 1980-05-22 |
| 2       | Jason Stockmore                    | Birthday  | 1987-12-07 |
| 3       | Mark and Stacy Mellony             | Wedding   | 2021-04-04 |
| 4       | Stephen and Janetsy Johnson        | Wedding   | 2012-01-30 |
| 5       | George Jackson and Sally Mistmoore | Wedding   | 2009-11-15 |
| 6       | Sandy Katz                         | Wedding   | 2010-03-19 |
| 6       | Jeff Trilov                        | Wedding   | 2016-09-09 |
| 7       | Sandra Kirchbaum                   | Wedding   | 2011-05-22 |
| 8       | Jessica Bower                      | Birthday  | 1996-02-26 |
| 8       | Frank Fjorn                        | Birthday  | 1969-07-19 |

I have tried a couple of approaches; one of which is using an update statement to update the First Name and update a subsequent FirstName based on Row number using a previously set @Values variable, unioning the result of the types that do not combine (in this case, Birthday). Here is where I built the names then used the MAX() aggregation to select the larger result:

SELECT HouseID,
       FirstName
     , LastName
     , EventType
     , EventDate
     , RowNum = ROW_NUMBER() OVER (PARTITION BY LastName, EventType ORDER BY 1/0)
     , Values1 = CAST(NULL AS VARCHAR(MAX))
INTO #EntityValues1
FROM @t
WHERE EventType = 'Wedding'

UPDATE #EntityValues1
SET @Values1 = Values1 =
        CASE WHEN RowNum = 1 
            THEN FirstName
            ELSE @Values1   ' and '   FirstName 
        END

However this example only works with combining FirstName1 FirstName2 LastName (in a subsequent query with MAX(Values1) ' ' LastName. I had to do a subsequent query to take the approach where I am combining names that do not have the same last name. I know this particular query is a bit tricky, but I'm wondering if there's any magic I'm missing out on. I've seen some suggestions that use a FOR XML approach with STUFF involved, and some other suggestions, but this one appears to be a tough one.

CodePudding user response:

Here is an answer, with a working demo

;WITH
[DoubleWeddings] AS (
SELECT
            [HouseID]
    FROM
            @t
    WHERE
            [EventType] = 'Wedding'
    GROUP BY
            [HouseID],
            [EventDate]
        HAVING
            COUNT(*) = 2
),
[DoubleWeddingsSameLastName] AS (
SELECT
            T.[HouseID]
    FROM
            [DoubleWeddings] DW
        JOIN
            @t T
                ON T.[HouseID] = DW.[HouseID]
    GROUP BY
            T.[HouseID],
            T.[LastName]
        HAVING
            COUNT(*) = 2
),
[DoubleWeddingsDifferentLastName] AS (
SELECT [HouseID] FROM [DoubleWeddings]
EXCEPT
SELECT [HouseID] FROM [DoubleWeddingsSameLastName]
),
[Couples] AS ( 
SELECT
           T.[HouseID],
           ROW_NUMBER() OVER (PARTITION BY T.[HouseID] ORDER BY 1/0) [RN],
           T.[FirstName],
           T.[LastName],
           T.[EventType],
           T.[EventDate]
    FROM
           [DoubleWeddings] DW
        JOIN
           @t T
               ON T.[HouseID] = DW.[HouseID]
)
SELECT
            DWSL.[HouseID],
            FORMATMESSAGE(
                '%s and %s %s',
                F.[FirstName],
                S.[FirstName],
                S.[LastName]) [FinalName],
            F.[EventType],
            F.[EventDate]
    FROM
            [DoubleWeddingsSameLastName] DWSL
        JOIN
            [Couples] F
                ON F.[HouseID] = DWSL.[HouseID] AND F.[RN] = 1
        JOIN
            [Couples] S
                ON S.[HouseID] = DWSL.[HouseID] AND S.[RN] = 2
UNION ALL
SELECT
            DWDL.[HouseID],
            FORMATMESSAGE(
                '%s %s and %s %s',
                F.[FirstName],
                F.[LastName],
                S.[FirstName],
                S.[LastName]) [FinalName],
            F.[EventType],
            F.[EventDate]
    FROM
            [DoubleWeddingsDifferentLastName] DWDL
        JOIN
            [Couples] F
                ON F.[HouseID] = DWDL.[HouseID] AND F.[RN] = 1
        JOIN
            [Couples] S
                ON S.[HouseID] = DWDL.[HouseID] AND S.[RN] = 2
UNION ALL
SELECT
            T.[HouseID],
            FORMATMESSAGE(
                '%s %s',
                T.[FirstName],
                T.[LastName]) [FinalName],
            T.[EventType],
            T.[EventDate]
    FROM
            @t T
        LEFT JOIN
            [DoubleWeddings] DW
                ON DW.[HouseID] = T.[HouseID]
    WHERE
            DW.[HouseID] IS NULL
    ORDER BY
            [HouseID],
            [EventDate],
            [FinalName]

There is one issue, I've used the ORDER BY 1/0 technique to skip providing an order for the ROW_NUMBER() which tends to assign the row numbers in the order of the underlying data. However, this is not guaranteed, and could very depending on the parallelization of the query.

It would be better if the order of the combination was provided by a column in the data, however, none is present in the example.

CodePudding user response:

To get this event calendar:

select 
  t.HouseID,
  CONCAT(STRING_AGG(CONCAT(FirstName,
                           ' ',
                           CASE WHEN ln.LastName<>t.LastName 
                                then t.LastName END),' and '),
         ' ',
         MIN(t.LastName)) as Name,
  t.EventType,
  t.EventDate
from t
left join (select t1.HouseID, min(t1.LastName) as LastName from t as t1 GROUP BY t1.HouseID having count(t1.LastName)=1) ln on ln.HouseID = t.HouseID
GROUP BY t.EventDate,  t.EventType, t.HouseID
order by month(t.EventDate), day(t.EventDate);

output:

HouseID Name EventType EventDate
1 Will Stanton Birthday 1974-01-05
4 Stephen and Janetsy Johnson Wedding 2012-01-30
8 Jessica Bower Birthday 1996-02-26
6 Sandy Katz Wedding 2010-03-19
3 Mark and Stacy Mellony Wedding 2021-04-04
7 Sandra Kirchbaum Wedding 2011-05-22
1 Mary Stanton Birthday 1980-05-22
8 Frank Fjorn Birthday 1969-07-19
6 Jeff Trilov Wedding 2016-09-09
5 George and Sally Jackson Wedding 2009-11-15
2 Jason Stockmore Birthday 1987-12-07

see: DBFIDDLE

EDIT:

  • Corrected the columnname (Name to FinalName), and the ordering of the results.
  • Fixed the value for HouseId=5
select 
  t.HouseID,
  CONCAT(STRING_AGG(CONCAT(FirstName,
                           ' ',
                           ISNULL(ln.LastName, t.LastName)
                                ),' and '),
         ' ',
         MIN(ln.LastName)) as FinalName,
  t.EventType,
  t.EventDate
from t
left join (select t1.HouseID, min(t1.LastName) as LastName from t as t1 GROUP BY t1.HouseID having count(t1.LastName)=1) ln on ln.HouseID = t.HouseID
GROUP BY t.EventDate,  t.EventType, t.HouseID
order by HouseID, EventDate
;

see: DBFIDDLE

output:

HouseID FinalName EventType EventDate
1 Will Stanton Birthday 1974-01-05
1 Mary Stanton Birthday 1980-05-22
2 Jason Stockmore Stockmore Birthday 1987-12-07
3 Mark Mellony and Stacy Mellony Wedding 2021-04-04
4 Stephen Johnson and Janetsy Johnson Wedding 2012-01-30
5 George Jackson and Sally Mistmoore Wedding 2009-11-15
6 Sandy Katz Wedding 2010-03-19
6 Jeff Trilov Wedding 2016-09-09
7 Sandra Kirchbaum Kirchbaum Wedding 2011-05-22
8 Frank Fjorn Birthday 1969-07-19
8 Jessica Bower Birthday 1996-02-26
  • Related