Home > Mobile >  Need help trying to combine 2 sql tables and join to a 3rd with a variable number of matches per row
Need help trying to combine 2 sql tables and join to a 3rd with a variable number of matches per row

Time:01-05

I have 3 SQL Server tables

  • INC - Incident Tickets
  • INT - Interaction tickets
  • KB - Knowledge article views

All 3 tables have columns for user ID, ticket number, timestamp. I am trying to develop reporting that will identify when one or more rows exist in KB that have the same user ID and date as a row in either INC or INT. Ideally, my output would be a union of INC and INT with a new column that would list the KB Ticket number for each matching line in a comma separated field. For example given the following lines in each table:

INC Ticket Number INC user ID INC Date
INC1234 id123 12/22/22
INC2345 id123 12/22/22
KB Ticket Number KB user ID KB Date
KB1234 id123 12/22/22
KB2345 id123 12/22/22

I would get this output:

INC Ticket Number INC user ID INC Date KB Tickets
INC1234 id123 12/22/22 KB1234,KB2345
INC2345 id123 12/22/22 KB1234,KB2345

The eventual destination for the output is going to be PowerBI. I initially tried to solve the issue in power query, but while I created a formula that successfully generated my desired output it was incredibly time and resource intensive as each table will have 1,000,000 or more lines and it took over 48 hours and never completed. I am trying to handle the comparison in the SQL query, but I am fairly new to SQL and can't seem to figure it out.

I got to the query below which will combine the 3 tables successfully, but only outputs a single match for each row:

select 
    inc.TicketNumber, inc.OpenTime, inc.Contact,
    kb.KBTicketNumber, kb.UpdateTime, kb.ViewedMMID
from 
    MMITMetrics.dbo.INC_IncidentTickets inc
full join  
    MMITMetrics.dbo.KB_Use kb on inc.Contact = kb.ViewedMMID 
                              and cast(inc.OpenTime as date) = cast(kb.UpdateTime as date)
where 
    inc.OpenTime > '2021-01-01 12:00:00.000' 
    or kb.UpdateTime > '2021-01-01 12:00:00.000'

union 

select 
    int.TicketNumber, int.OpenTime,int.Contact,
    kb.KBTicketNumber, kb.UpdateTime, kb.ViewedMMID
from 
    MMITMetrics.dbo.INT_InteractionTickets int 
full join  
    MMITMetrics.dbo.KB_Use kb on int.Contact = kb.ViewedMMID 
                              and cast(int.OpenTime as date) = cast(kb.UpdateTime as date)
where 
    int.OpenTime > '2021-01-01 12:00:00.000' 
    or kb.UpdateTime > '2021-01-01 12:00:00.000'

I am using Microsoft SQL Server Management Studio 18, so I think I need to use the string_agg function but can't get it to work properly.

CodePudding user response:

WITH INC_DATA(INC_TICKET_NUMBER,INC_USER_ID,INC_DATE)AS
(
  SELECT 'INC1234','id123','2022-12-22' UNION ALL
  SELECT 'INC2345','id123','2022-12-22'
),
KB_DATA(KB_TICKET_NUMBER,KB_USER_ID,KB_DATE)AS
(
   SELECT 'KB1234','id123','2022-12-22'UNION ALL
   SELECT 'KB2345','id123','2022-12-22'
)
SELECT I.INC_TICKET_NUMBER,I.INC_USER_ID,I.INC_DATE,STRING_AGG(K.KB_TICKET_NUMBER,',')KB
FROM INC_DATA AS I
JOIN KB_DATA AS K ON I.INC_USER_ID=K.KB_USER_ID AND I.INC_DATE=K.KB_DATE
GROUP BY I.INC_TICKET_NUMBER,I.INC_USER_ID,I.INC_DATE

Microsoft SQL Server 2017 (RTM-CU31) (KB5016884) - 14.0.3456.2 (X64) Sep 2 2022 11:01:50

Based on your sample data and description you can try something like above (SQL Server 2017 )

CodePudding user response:

You are right, you need string_agg. You could do the following:

select 
inc.TicketNumber, inc.OpenTime, inc.Contact,
kb.UpdateTime, STRING_AGG(kb.KBTicketNumber,',') AS KBTicketNumber

from inc
full join  kb on inc.Contact = kb.ViewedMMID and CAST(inc.OpenTime as date) = CAST(kb.UpdateTime as date)
where inc.OpenTime > '2021-01-01 12:00:00.000' or kb.UpdateTime > '2021-01-01 12:00:00.000'

GROUP BY inc.TicketNumber, inc.OpenTime, inc.Contact, kb.UpdateTime

If you have an older version of SQL Server, that won't work, since you don't have support for that function. This might not be the best way, but in the past I've done things like this to reach the desired result:

SELECT n.TicketNumber, n.OpenTime, n.Contact,
n.UpdateTime, 
MAX( CASE WHEN n.rnk = 1 THEN [KBTicketNumber] ELSE '' END )   
MAX( CASE WHEN n.rnk = 2 THEN ','   [KBTicketNumber] ELSE '' END )   
MAX( CASE WHEN n.rnk = 3 THEN ','   [KBTicketNumber] ELSE '' END )   
MAX( CASE WHEN n.rnk = 4 THEN ','   [KBTicketNumber] ELSE '' END )   
MAX( CASE WHEN n.rnk = 5 THEN ','   [KBTicketNumber] ELSE '' END ) AS [KBTicketNumber]
FROM (
select 
inc.TicketNumber, inc.OpenTime, inc.Contact,
kb.UpdateTime, kb.KBTicketNumber,
ROW_NUMBER() OVER (PARTITION BY inc.TicketNumber, inc.OpenTime, inc.Contact,
kb.UpdateTime ORDER BY kb.UpdateTime) AS rnk
from inc
full join  kb on inc.Contact = kb.ViewedMMID and CAST(inc.OpenTime as date) = CAST(kb.UpdateTime as date)
where inc.OpenTime > '2021-01-01 12:00:00.000' or kb.UpdateTime > '2021-01-01 12:00:00.000'
) AS n

GROUP BY n.TicketNumber, n.OpenTime, n.Contact,
n.UpdateTime
  • Related