I have the follwing table:
machineId | periodId | errorId |
---|---|---|
AGR0.37576 | 13 | ERR561 |
AGR0.37576 | 14 | ERR561 |
AGR0.37576 | 14 | ERR458 |
AGR0.37576 | 15 | ERR561 |
AGR0.37576 | 15 | ERR458 |
AGR0.37576 | 16 | ERR458 |
AGR0.37576 | 16 | ERR561 |
AGR0.37576 | 17 | ERR561 |
AGR0.37576 | 17 | ERR458 |
AGR0.37576 | 18 | ERR458 |
AGR0.37576 | 18 | ERR561 |
AGR0.37576 | 19 | ERR561 |
AGR0.37576 | 20 | ERR561 |
"machineId" is a uniqueidentifier that identifies the machine we have at hand [AGR(agriculture)0(a tracktor).37576(the SN of the machine itself)]
"periodId" is a uniqueidentifier for the period: e.g. 13 (2021-12-31), 14 (2022-01-01), etc.
"errorId" is a uniqueidentidier for the error the machine has had in that period. e.g. ERR561 - overheating, ERR457 - coolant level very low, ERR458 - coolant level very low It can be one or more errors per period. If it is more thane one error a new record with the same period and the new error code is inserted.
I want to pivot and present the table like so:
machineId | periodId | ERROR1 | ERROR2 | ERROR3 | ERROR4 |
---|---|---|---|---|---|
AGR7.00012 | 9 | ERR221 | MIS061 | ERG737 | SER003 |
AGR0.37576 | 13 | ERR561 | NULL | NULL | NULL |
ROAD.88887 | 13 | ERR561 | NULL | NULL | NULL |
AGR0.37576 | 14 | ERR845 | ERR561 | ERR737 | NULL |
I have two problems:
1. All I could find until now explains more or less how to pivot sales data to some period (Qs, HYs, etc). As you can see the only column, I can aggregate on is the periodId, which does not makes sense.
2. The amount of errors per period varies, I can have no errors per machine for a given period and 100 for another. Any ideas how to achieve this using SQL?
Sample data:
DROP TABLE IF EXISTS #sampleData
CREATE TABLE #sampleData ( [machineId] varchar(18), [periodId] int, [errorId] varchar(13) )
INSERT INTO #sampleData ([machineId], [periodId], [errorId])
VALUES
( 'AGR0.37576', 13, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR458' ),
( 'AGR0.37576', 14, 'ERR737' ),
( 'AGR0.37576', 15, 'ERR561' ),
( 'AGR0.37576', 15, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR561' ),
( 'AGR0.37576', 19, 'ERR561' ),
( 'AGR0.37576', 20, 'ERR561' ),
( 'AGR0.37576', 21, 'ERR561' ),
( 'AGR0.37576', 22, 'ERR561' ),
( 'AGR0.37576', 29, 'ERR561' ),
( 'AGR0.37576', 30, 'ERR561' ),
( 'AGR0.37576', 96, 'ERR561' ),
( 'AGR0.37576', 97, 'ERR561' ),
( 'AGR0.37576', 111, 'ERR561' ),
( 'AGR0.37576', 112, 'ERR561' ),
( 'AGR0.37576', 113, 'ERR561' ),
( 'AGR7.00012', 9, 'ERR221' ),
( 'AGR7.00012', 9, 'MIS061' ),
( 'AGR7.00012', 9, 'ERG737' ),
( 'AGR7.00012', 9, 'SER003' ),
( 'ROAD.88887', 13, 'ERR561' )
CodePudding user response:
I added two columns to the SampleData table:
- Id: I don't know the order in your table
- RowNum: To know how many columns I must add to the Temp table.
In general, you can play with this script and improve it.
DROP TABLE IF EXISTS #sampleData
CREATE TABLE #sampleData ( [machineId] varchar(18), [periodId] int, [errorId] varchar(13), Id INT IDENTITY, RowNum INT)
INSERT INTO #sampleData ([machineId], [periodId], [errorId])
VALUES
( 'AGR0.37576', 13, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR458' ),
( 'AGR0.37576', 14, 'ERR737' ),
( 'AGR0.37576', 15, 'ERR561' ),
( 'AGR0.37576', 15, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR561' ),
( 'AGR0.37576', 19, 'ERR561' ),
( 'AGR0.37576', 20, 'ERR561' ),
( 'AGR0.37576', 21, 'ERR561' ),
( 'AGR0.37576', 22, 'ERR561' ),
( 'AGR0.37576', 29, 'ERR561' ),
( 'AGR0.37576', 30, 'ERR561' ),
( 'AGR0.37576', 96, 'ERR561' ),
( 'AGR0.37576', 97, 'ERR561' ),
( 'AGR0.37576', 111, 'ERR561' ),
( 'AGR0.37576', 112, 'ERR561' ),
( 'AGR0.37576', 113, 'ERR561' ),
( 'AGR7.00012', 9, 'ERR221' ),
( 'AGR7.00012', 9, 'MIS061' ),
( 'AGR7.00012', 9, 'ERG737' ),
( 'AGR7.00012', 9, 'SER003' ),
( 'ROAD.88887', 13, 'ERR561' )
;WITH T1 AS (
SELECT [machineId], [periodId], [errorId], RowNum,
ROW_NUMBER() OVER (PARTITION BY [machineId], [periodId] ORDER BY Id) AS RowNumber
FROM #sampleData
)
UPDATE T1 SET RowNum = RowNumber
CREATE TABLE #Temp (machineId VARCHAR(18), periodId INT)
INSERT INTO #Temp (machineId, periodId)
SELECT machineId, periodId
FROM #sampleData
GROUP BY machineId, periodId
DECLARE @UpdateStatment AS NVARCHAR(MAX) = ''
DECLARE @SQLString AS NVARCHAR(MAX) = ''
SELECT @SQLString = @SQLString ' ALTER TABLE #Temp ADD ERROR' CAST(S1.RowNum AS NVARCHAR) ' VARCHAR(13);',
@UpdateStatment = @UpdateStatment
'UPDATE T SET ERROR' CAST(S1.RowNum AS NVARCHAR) '= S.ErrorId
FROM #Temp T
INNER JOIN #SampleData S ON T.machineId = S.machineId AND T.periodId = S.periodId WHERE S.RowNum = ' CAST(S1.RowNum AS NVARCHAR) '; '
FROM #SampleData S1 INNER JOIN
(SELECT [machineId], [periodId]
FROM #SampleData
GROUP BY [machineId], [periodId]
HAVING MAX(RowNum) >= ALL (SELECT RowNum FROM #SampleData) ) AS S2 ON S1.[machineId] = S2.[machineId] AND S1.[periodId] = S2.[periodId]
ORDER BY S1.RowNum
SELECT @SQLString
SELECT @UpdateStatment
EXEC SP_executesql @SQLString
EXEC SP_executesql @UpdateStatment
SELECT * FROM #Temp
--DROP TABLE #Temp
--DROP TABLE #SampleData
CodePudding user response:
There are a couple of ways to do this. If you have a finite number of errors then you could do something simple like this
-- static column version
SELECT * FROM
(
SELECT
*
, ColNum = CONCAT('Col', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) )
FROM #sampleData
) s
PIVOT
(
MIN(ErrorID)
FOR [ColNum] IN (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10)
) pvt
I've added the 'Col' string to the start to make the pivot statement more readable (otherwise the column names would start with numbers).
This statis version will always produce the same number of columns even if they are all blank.
If you want it fully dynamic then you you will have to generate a list of the column names first, the final statement is almost the same as above.
-- dynamic version
-- Get a list of column names required
DECLARE @ColNums varchar(1000) = ''
SELECT @ColNums = CONCAT(@ColNums, ColNum , ', ')
FROM (SELECT DISTINCT ColNum FROM (
SELECT
*
, ColNum = CONCAT('Col', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) ) --column name per machine/period
FROM #sampleData ) c
) x
SET @ColNums = LEFT(@ColNums, LEN(@ColNums)-1) -- trim the final comma
-- Generate the final sql
DECLARE @sql varchar(max) =
'SELECT * FROM
(SELECT *, ColNum = CONCAT(''Col'', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) )
FROM #sampleData ) s
PIVOT
(MIN(ErrorID) FOR [ColNum] IN (' @ColNums ')) pvt
ORDER BY MachineID, PeriodID'
EXEC (@sql)
Note: The dynamic version also formats the ColNums so the numeric part is always 3 numbers long, this assumes you won't have more than 999 error columns !! If you don't need this much padding, then you can reduce the D3
to D2
in the format function.
This gives the following results