Home > Software design >  SQL funky aggregations, seeking assistance
SQL funky aggregations, seeking assistance

Time:06-23

I have the following scenario

System Subsystem & Filename File Load Start Time File Load End Time
Alpha A1 transactiontxt 2022-06-19 08:00:00 2022-06-19 08:00:02
Alpha A2 userscsv 2022-06-19 08:00:02 2022-06-19 08:00:05
Alpha A2 employeescsv 2022-06-19 08:00:05 2022-06-19 08:00:08
Alpha A1 managerscsv 2022-06-19 08:00:00 2022-06-19 08:00:02
Alpha A3 customerscsv 2022-06-19 08:00:01 2022-06-19 08:00:04
Gamma A1 transactiontxt 2022-06-19 10:00:48 2022-06-19 10:00:53
Gamma A2 userscsv 2022-06-19 10:00:53 2022-06-19 10:00:54
Gamma A2 employeescsv 2022-06-19 10:00:27 2022-06-19 10:00:30
Gamma A1 managerscsv 2022-06-19 10:00:11 2022-06-19 10:00:17
Gamma A3 customerscsv 2022-06-19 10:00:13 2022-06-19 10:00:14

I want to be able to group the summary statistics by System. The info needed is when the overall started (earliest time), when it ended (latest time), and the time it took for each subsystem to occur, in seconds. From the example above, the result should look as below:

System Overall System Load Start Time Overall System Load End Time A1 Time Taken A2 Time Taken A3 Time Taken
Alpha 2022-06-19 08:00:00 2022-06-19 08:00:08 00:00:02 00:00:06 00:00:03
Gamma 2022-06-19 10:00:11 2022-06-19 10:00:54 00:00:11 00:00:04 00:00:01

I cannot find a way to do this in a query, I'm trying to do select subqueries in the select clause for each column, and at the end group by only System. But this is not possible because I'd have to use an aggregate function which is not being supported with case statements in subqueries in the select clause

My approach was something like

SELECT System, 
min(StartTime) as 'File Load Start Time',
max(EndTime) as 'File Load End Time', 
CASE WHEN SubSystem LIKE 'A1%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A1 Time Taken',
CASE WHEN SubSystem LIKE 'A2%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A2 Time Taken',
CASE WHEN SubSystem LIKE 'A3%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A3 Time Taken'
FROM TABLE GROUP BY SYSTEM

But this does not work because the case statements need to be in a group by clause as well, and I cannot aggregate them. Sum will not work because if the same exact time is used for two subsystems, I don't want it to be summed -- they were done at the same time.

CodePudding user response:

An aggregation function cannot be employed inside another aggregation function SUM(... MIN(...), MAX(...)). In order to simplify your query, you need to split the operations you carry out in:

  • detecting unique rows of your data
  • extracting the value of "Subsystem" from "Subsystem & Filename"
  • pivoting on the "Subsystem"
  • aggregating on your relevant fields
WITH cte AS (
    SELECT DISTINCT [System],
                    LEFT([Subsystem & Filename], 
                         CHARINDEX(' ', [Subsystem & Filename])-1) AS [Subsystem],
                    [File Load Start Time]                         AS [StartTime],
                    [File Load End Time]                           AS [EndTime]
    FROM tab
)   
SELECT [System],
       MIN([StartTime])        AS [Overall System Load Start Time],
       MAX([EndTime])          AS [Overall System Load End Time],
       SUM(CASE WHEN [Subsystem]='A1' 
                THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A1 Time Taken],
       SUM(CASE WHEN [Subsystem]='A2' 
                THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A2 Time Taken],
       SUM(CASE WHEN [Subsystem]='A3' 
                THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A3 Time Taken]
FROM cte
GROUP BY [System]

Check the demo here.

CodePudding user response:

You violate First Normal Form thad says data must be atomic into table's columns. Refer to Ted Codd (RDBMS inventor) : "The Relational Model for Database Management Version 2" (Addison-Wesley, 1990).

The fact thatbyou need to query a part of data inside a column proove that your data does not respect the rules !

You must first correct this point before trying to do a query.

One way is extractiong this value into a new colums like this :

WITH 
T_1FN AS
(SELECT *, LEFT([Subsystem & Filename], CHARINDEX(' ', [Subsystem & Filename]) - 1 ) AS Prefix
 FROM   MyTable)
SELECT ...

The your final query will be more simple to write...

  • Related