I have the following scenario
System | Subsystem & Filename | File Load Start Time | File Load End Time |
---|---|---|---|
Alpha | A1 transactiontxt | 2022-06-19 08:00:00 | 2022-06-19 08:00:02 |
Alpha | A2 userscsv | 2022-06-19 08:00:02 | 2022-06-19 08:00:05 |
Alpha | A2 employeescsv | 2022-06-19 08:00:05 | 2022-06-19 08:00:08 |
Alpha | A1 managerscsv | 2022-06-19 08:00:00 | 2022-06-19 08:00:02 |
Alpha | A3 customerscsv | 2022-06-19 08:00:01 | 2022-06-19 08:00:04 |
Gamma | A1 transactiontxt | 2022-06-19 10:00:48 | 2022-06-19 10:00:53 |
Gamma | A2 userscsv | 2022-06-19 10:00:53 | 2022-06-19 10:00:54 |
Gamma | A2 employeescsv | 2022-06-19 10:00:27 | 2022-06-19 10:00:30 |
Gamma | A1 managerscsv | 2022-06-19 10:00:11 | 2022-06-19 10:00:17 |
Gamma | A3 customerscsv | 2022-06-19 10:00:13 | 2022-06-19 10:00:14 |
I want to be able to group the summary statistics by System. The info needed is when the overall started (earliest time), when it ended (latest time), and the time it took for each subsystem to occur, in seconds. From the example above, the result should look as below:
System | Overall System Load Start Time | Overall System Load End Time | A1 Time Taken | A2 Time Taken | A3 Time Taken |
---|---|---|---|---|---|
Alpha | 2022-06-19 08:00:00 | 2022-06-19 08:00:08 | 00:00:02 | 00:00:06 | 00:00:03 |
Gamma | 2022-06-19 10:00:11 | 2022-06-19 10:00:54 | 00:00:11 | 00:00:04 | 00:00:01 |
I cannot find a way to do this in a query, I'm trying to do select subqueries in the select clause for each column, and at the end group by only System. But this is not possible because I'd have to use an aggregate function which is not being supported with case statements in subqueries in the select clause
My approach was something like
SELECT System,
min(StartTime) as 'File Load Start Time',
max(EndTime) as 'File Load End Time',
CASE WHEN SubSystem LIKE 'A1%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A1 Time Taken',
CASE WHEN SubSystem LIKE 'A2%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A2 Time Taken',
CASE WHEN SubSystem LIKE 'A3%' THEN SUM(DATEDIFF(s, min(StartTime), max(EndTime))) Else 0 END AS 'A3 Time Taken'
FROM TABLE GROUP BY SYSTEM
But this does not work because the case statements need to be in a group by clause as well, and I cannot aggregate them. Sum will not work because if the same exact time is used for two subsystems, I don't want it to be summed -- they were done at the same time.
CodePudding user response:
An aggregation function cannot be employed inside another aggregation function SUM(... MIN(...), MAX(...))
. In order to simplify your query, you need to split the operations you carry out in:
- detecting unique rows of your data
- extracting the value of "Subsystem" from "Subsystem & Filename"
- pivoting on the "Subsystem"
- aggregating on your relevant fields
WITH cte AS (
SELECT DISTINCT [System],
LEFT([Subsystem & Filename],
CHARINDEX(' ', [Subsystem & Filename])-1) AS [Subsystem],
[File Load Start Time] AS [StartTime],
[File Load End Time] AS [EndTime]
FROM tab
)
SELECT [System],
MIN([StartTime]) AS [Overall System Load Start Time],
MAX([EndTime]) AS [Overall System Load End Time],
SUM(CASE WHEN [Subsystem]='A1'
THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A1 Time Taken],
SUM(CASE WHEN [Subsystem]='A2'
THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A2 Time Taken],
SUM(CASE WHEN [Subsystem]='A3'
THEN DATEDIFF(s, [StartTime], [EndTime]) END) AS [A3 Time Taken]
FROM cte
GROUP BY [System]
Check the demo here.
CodePudding user response:
You violate First Normal Form thad says data must be atomic into table's columns. Refer to Ted Codd (RDBMS inventor) : "The Relational Model for Database Management Version 2" (Addison-Wesley, 1990).
The fact thatbyou need to query a part of data inside a column proove that your data does not respect the rules !
You must first correct this point before trying to do a query.
One way is extractiong this value into a new colums like this :
WITH
T_1FN AS
(SELECT *, LEFT([Subsystem & Filename], CHARINDEX(' ', [Subsystem & Filename]) - 1 ) AS Prefix
FROM MyTable)
SELECT ...
The your final query will be more simple to write...