Home > Enterprise >  SQL advanced grouping & case statements , is this possible?
SQL advanced grouping & case statements , is this possible?

Time:06-20

I have the following scenario

System Subsystem & Filename File Load Start Time File Load End Time
Alpha A1 transactiontxt 2022-06-19 08:00:00 2022-06-19 08:00:02
Alpha A2 userscsv 2022-06-19 08:00:02 2022-06-19 08:00:05
Alpha A2 employeescsv 2022-06-19 08:00:05 2022-06-19 08:00:08
Alpha A1 managerscsv 2022-06-19 08:00:08 2022-06-19 08:00:16
Alpha A3 customerscsv 2022-06-19 08:00:01 2022-06-19 08:00:04
Gamma A1 transactiontxt 2022-06-19 10:00:48 2022-06-19 10:00:53
Gamma A2 userscsv 2022-06-19 10:00:53 2022-06-19 10:00:54
Gamma A2 employeescsv 2022-06-19 10:00:27 2022-06-19 10:00:30
Gamma A1 managerscsv 2022-06-19 10:00:11 2022-06-19 10:00:17
Gamma A3 customerscsv 2022-06-19 10:00:13 2022-06-19 10:00:14

I want to be able to group the summary statistics by System. The info needed is when the overall started (earliest time), when it ended (latest time), and the time it took for each subsystem to occur, in seconds. From the example above, the result should look as below:

System Overall System Load Start Time Overall System Load End Time A1 Time Taken A2 Time Taken A3 Time Taken
Alpha 2022-06-19 08:00:00 2022-06-19 08:00:16 00:00:10 00:00:06 00:00:03
Gamma 2022-06-19 10:00:11 2022-06-19 10:00:54 00:00:11 00:00:04 00:00:01

I cannot find a way to do this in a query, I'm trying to do select subqueries in the select clause for each column, and at the end group by only System. But this is not possible because I'd have to use an aggregate function which is not being supported with case statements in subqueries in the select clause

My approach was something like

SELECT System, 
min(StartTime) as 'File Load Start Time',
max(EndTime) as 'File Load End Time', 
CASE WHEN SubSystem LIKE 'A1%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A1 Time Taken',
CASE WHEN SubSystem LIKE 'A2%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A2 Time Taken',
CASE WHEN SubSystem LIKE 'A3%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A3 Time Taken'
FROM TABLE GROUP BY SYSTEM

But this does not work because the case statements need to be in a group by clause as well, and I cannot aggregate them

CodePudding user response:

I am assuming your subsystems are finite and doesn't require dynamic columns (pivot/crosstab).

Folloing query should give you the desired output.

SELECT
    system_name AS "System", 
    min(file_load_start_time) AS "Overall System Load Start Time", 
    max(file_load_end_time) AS "Overall System Load End Time",
    (
    SELECT
        sec_to_time(sum(timestampdiff(SECOND, tsl_a1.file_load_start_time, tsl_a1.file_load_end_time)))
    FROM
        t_system_log tsl_a1
    WHERE
        tsl_a1.system_name = tsl.system_name
        AND LEFT(tsl_a1.subsystem_filename,
        2) = 'A1') AS "A1 Time Taken",
    (
    SELECT
        sec_to_time(sum(timestampdiff(SECOND, tsl_a2.file_load_start_time, tsl_a2.file_load_end_time)))
    FROM
        t_system_log tsl_a2
    WHERE
        tsl_a2.system_name = tsl.system_name
        AND LEFT(tsl_a2.subsystem_filename,
        2) = 'A2') AS "A2 Time Taken",
    (
    SELECT
        sec_to_time(sum(timestampdiff(SECOND, tsl_a3.file_load_start_time, tsl_a3.file_load_end_time)))
    FROM
        t_system_log tsl_a3
    WHERE
        tsl_a3.system_name = tsl.system_name
        AND LEFT(tsl_a3.subsystem_filename,
        2) = 'A3') AS "A3 Time Taken"
FROM
    t_system_log tsl
GROUP BY
    system_name;

In case, your subsystems are dynamic, you should use pivot query instead of subquery. Also subquery might impact on the performance.

My Fiddle

CodePudding user response:

Your original query just needs a little bit of tweaking in order to work properly. I tested it in workbench. It works now.

SELECT System, 
    min(StartTime) as 'File Load Start Time',
    max(EndTime) as 'File Load End Time', 
    sum( case when left(subsystem,2)='a1' then to_seconds(endtime)-to_seconds(starttime) else 0 end 
    ) as 'A1 Time Taken',
    sum( case when left(subsystem,2)='a2' then to_seconds(endtime)-to_seconds(starttime) else 0 end 
    ) as 'A2 Time Taken',
    sum( case when left(subsystem,2)='a3' then to_seconds(endtime)-to_seconds(starttime) else 0 end 
    ) as 'A3 Time Taken'
FROM TABLE  GROUP BY SYSTEM
;
  • Related