I have 3 SQL Server tables
- INC - Incident Tickets
- INT - Interaction tickets
- KB - Knowledge article views
All 3 tables have columns for user ID, ticket number, timestamp. I am trying to develop reporting that will identify when one or more rows exist in KB that have the same user ID and date as a row in either INC or INT. Ideally, my output would be a union of INC and INT with a new column that would list the KB Ticket number for each matching line in a comma separated field. For example given the following lines in each table:
INC Ticket Number | INC user ID | INC Date |
---|---|---|
INC1234 | id123 | 12/22/22 |
INC2345 | id123 | 12/22/22 |
KB Ticket Number | KB user ID | KB Date |
---|---|---|
KB1234 | id123 | 12/22/22 |
KB2345 | id123 | 12/22/22 |
I would get this output:
INC Ticket Number | INC user ID | INC Date | KB Tickets |
---|---|---|---|
INC1234 | id123 | 12/22/22 | KB1234,KB2345 |
INC2345 | id123 | 12/22/22 | KB1234,KB2345 |
The eventual destination for the output is going to be PowerBI. I initially tried to solve the issue in power query, but while I created a formula that successfully generated my desired output it was incredibly time and resource intensive as each table will have 1,000,000 or more lines and it took over 48 hours and never completed. I am trying to handle the comparison in the SQL query, but I am fairly new to SQL and can't seem to figure it out.
I got to the query below which will combine the 3 tables successfully, but only outputs a single match for each row:
select
inc.TicketNumber, inc.OpenTime, inc.Contact,
kb.KBTicketNumber, kb.UpdateTime, kb.ViewedMMID
from
MMITMetrics.dbo.INC_IncidentTickets inc
full join
MMITMetrics.dbo.KB_Use kb on inc.Contact = kb.ViewedMMID
and cast(inc.OpenTime as date) = cast(kb.UpdateTime as date)
where
inc.OpenTime > '2021-01-01 12:00:00.000'
or kb.UpdateTime > '2021-01-01 12:00:00.000'
union
select
int.TicketNumber, int.OpenTime,int.Contact,
kb.KBTicketNumber, kb.UpdateTime, kb.ViewedMMID
from
MMITMetrics.dbo.INT_InteractionTickets int
full join
MMITMetrics.dbo.KB_Use kb on int.Contact = kb.ViewedMMID
and cast(int.OpenTime as date) = cast(kb.UpdateTime as date)
where
int.OpenTime > '2021-01-01 12:00:00.000'
or kb.UpdateTime > '2021-01-01 12:00:00.000'
I am using Microsoft SQL Server Management Studio 18, so I think I need to use the string_agg
function but can't get it to work properly.
CodePudding user response:
WITH INC_DATA(INC_TICKET_NUMBER,INC_USER_ID,INC_DATE)AS
(
SELECT 'INC1234','id123','2022-12-22' UNION ALL
SELECT 'INC2345','id123','2022-12-22'
),
KB_DATA(KB_TICKET_NUMBER,KB_USER_ID,KB_DATE)AS
(
SELECT 'KB1234','id123','2022-12-22'UNION ALL
SELECT 'KB2345','id123','2022-12-22'
)
SELECT I.INC_TICKET_NUMBER,I.INC_USER_ID,I.INC_DATE,STRING_AGG(K.KB_TICKET_NUMBER,',')KB
FROM INC_DATA AS I
JOIN KB_DATA AS K ON I.INC_USER_ID=K.KB_USER_ID AND I.INC_DATE=K.KB_DATE
GROUP BY I.INC_TICKET_NUMBER,I.INC_USER_ID,I.INC_DATE
Microsoft SQL Server 2017 (RTM-CU31) (KB5016884) - 14.0.3456.2 (X64) Sep 2 2022 11:01:50
Based on your sample data and description you can try something like above (SQL Server 2017 )
CodePudding user response:
You are right, you need string_agg. You could do the following:
select
inc.TicketNumber, inc.OpenTime, inc.Contact,
kb.UpdateTime, STRING_AGG(kb.KBTicketNumber,',') AS KBTicketNumber
from inc
full join kb on inc.Contact = kb.ViewedMMID and CAST(inc.OpenTime as date) = CAST(kb.UpdateTime as date)
where inc.OpenTime > '2021-01-01 12:00:00.000' or kb.UpdateTime > '2021-01-01 12:00:00.000'
GROUP BY inc.TicketNumber, inc.OpenTime, inc.Contact, kb.UpdateTime
If you have an older version of SQL Server, that won't work, since you don't have support for that function. This might not be the best way, but in the past I've done things like this to reach the desired result:
SELECT n.TicketNumber, n.OpenTime, n.Contact,
n.UpdateTime,
MAX( CASE WHEN n.rnk = 1 THEN [KBTicketNumber] ELSE '' END )
MAX( CASE WHEN n.rnk = 2 THEN ',' [KBTicketNumber] ELSE '' END )
MAX( CASE WHEN n.rnk = 3 THEN ',' [KBTicketNumber] ELSE '' END )
MAX( CASE WHEN n.rnk = 4 THEN ',' [KBTicketNumber] ELSE '' END )
MAX( CASE WHEN n.rnk = 5 THEN ',' [KBTicketNumber] ELSE '' END ) AS [KBTicketNumber]
FROM (
select
inc.TicketNumber, inc.OpenTime, inc.Contact,
kb.UpdateTime, kb.KBTicketNumber,
ROW_NUMBER() OVER (PARTITION BY inc.TicketNumber, inc.OpenTime, inc.Contact,
kb.UpdateTime ORDER BY kb.UpdateTime) AS rnk
from inc
full join kb on inc.Contact = kb.ViewedMMID and CAST(inc.OpenTime as date) = CAST(kb.UpdateTime as date)
where inc.OpenTime > '2021-01-01 12:00:00.000' or kb.UpdateTime > '2021-01-01 12:00:00.000'
) AS n
GROUP BY n.TicketNumber, n.OpenTime, n.Contact,
n.UpdateTime