How to pivot data without aggregations and with variable amount of columns?-CodePudding

I have the follwing table:

machineId	periodId	errorId
AGR0.37576	13	ERR561
AGR0.37576	14	ERR561
AGR0.37576	14	ERR458
AGR0.37576	15	ERR561
AGR0.37576	15	ERR458
AGR0.37576	16	ERR458
AGR0.37576	16	ERR561
AGR0.37576	17	ERR561
AGR0.37576	17	ERR458
AGR0.37576	18	ERR458
AGR0.37576	18	ERR561
AGR0.37576	19	ERR561
AGR0.37576	20	ERR561

"machineId" is a uniqueidentifier that identifies the machine we have at hand [AGR(agriculture)0(a tracktor).37576(the SN of the machine itself)]
"periodId" is a uniqueidentifier for the period: e.g. 13 (2021-12-31), 14 (2022-01-01), etc.
"errorId" is a uniqueidentidier for the error the machine has had in that period. e.g. ERR561 - overheating, ERR457 - coolant level very low, ERR458 - coolant level very low It can be one or more errors per period. If it is more thane one error a new record with the same period and the new error code is inserted.

I want to pivot and present the table like so:

machineId	periodId	ERROR1	ERROR2	ERROR3	ERROR4
AGR7.00012	9	ERR221	MIS061	ERG737	SER003
AGR0.37576	13	ERR561	NULL	NULL	NULL
ROAD.88887	13	ERR561	NULL	NULL	NULL
AGR0.37576	14	ERR845	ERR561	ERR737	NULL

I have two problems:

1. All I could find until now explains more or less how to pivot sales data to some period (Qs, HYs, etc). As you can see the only column, I can aggregate on is the periodId, which does not makes sense.

2. The amount of errors per period varies, I can have no errors per machine for a given period and 100 for another. Any ideas how to achieve this using SQL?

Sample data:

DROP TABLE IF EXISTS #sampleData
CREATE TABLE #sampleData ( [machineId] varchar(18), [periodId] int, [errorId] varchar(13) )

INSERT INTO #sampleData ([machineId], [periodId], [errorId])
VALUES
( 'AGR0.37576', 13, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR458' ),
( 'AGR0.37576', 14, 'ERR737' ),
( 'AGR0.37576', 15, 'ERR561' ),
( 'AGR0.37576', 15, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR561' ),
( 'AGR0.37576', 19, 'ERR561' ),
( 'AGR0.37576', 20, 'ERR561' ),
( 'AGR0.37576', 21, 'ERR561' ),
( 'AGR0.37576', 22, 'ERR561' ),
( 'AGR0.37576', 29, 'ERR561' ),
( 'AGR0.37576', 30, 'ERR561' ),
( 'AGR0.37576', 96, 'ERR561' ),
( 'AGR0.37576', 97, 'ERR561' ),
( 'AGR0.37576', 111, 'ERR561' ),
( 'AGR0.37576', 112, 'ERR561' ),
( 'AGR0.37576', 113, 'ERR561' ),
( 'AGR7.00012', 9, 'ERR221' ),
( 'AGR7.00012', 9, 'MIS061' ),
( 'AGR7.00012', 9, 'ERG737' ),
( 'AGR7.00012', 9, 'SER003' ),
( 'ROAD.88887', 13, 'ERR561' )

CodePudding user response：

I added two columns to the SampleData table:

Id: I don't know the order in your table
RowNum: To know how many columns I must add to the Temp table.

In general, you can play with this script and improve it.

DROP TABLE IF EXISTS #sampleData
CREATE TABLE #sampleData ( [machineId] varchar(18), [periodId] int, [errorId] varchar(13), Id INT IDENTITY, RowNum INT)

INSERT INTO #sampleData ([machineId], [periodId], [errorId])
VALUES
( 'AGR0.37576', 13, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR458' ),
( 'AGR0.37576', 14, 'ERR737' ),
( 'AGR0.37576', 15, 'ERR561' ),
( 'AGR0.37576', 15, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR561' ),
( 'AGR0.37576', 19, 'ERR561' ),
( 'AGR0.37576', 20, 'ERR561' ),
( 'AGR0.37576', 21, 'ERR561' ),
( 'AGR0.37576', 22, 'ERR561' ),
( 'AGR0.37576', 29, 'ERR561' ),
( 'AGR0.37576', 30, 'ERR561' ),
( 'AGR0.37576', 96, 'ERR561' ),
( 'AGR0.37576', 97, 'ERR561' ),
( 'AGR0.37576', 111, 'ERR561' ),
( 'AGR0.37576', 112, 'ERR561' ),
( 'AGR0.37576', 113, 'ERR561' ),
( 'AGR7.00012', 9, 'ERR221' ),
( 'AGR7.00012', 9, 'MIS061' ),
( 'AGR7.00012', 9, 'ERG737' ),
( 'AGR7.00012', 9, 'SER003' ),
( 'ROAD.88887', 13, 'ERR561' )


 ;WITH T1 AS (
                SELECT [machineId], [periodId], [errorId], RowNum,
                                ROW_NUMBER() OVER (PARTITION BY [machineId], [periodId] ORDER BY Id) AS RowNumber
                FROM #sampleData
                                    )
  UPDATE T1 SET RowNum = RowNumber


 CREATE TABLE #Temp (machineId VARCHAR(18), periodId INT)

 
 INSERT INTO #Temp (machineId, periodId)
 SELECT machineId, periodId
 FROM #sampleData
 GROUP BY machineId, periodId



 DECLARE @UpdateStatment AS NVARCHAR(MAX) = ''
 DECLARE @SQLString AS NVARCHAR(MAX) = ''

 SELECT @SQLString = @SQLString   ' ALTER TABLE #Temp ADD ERROR'   CAST(S1.RowNum AS NVARCHAR)   '  VARCHAR(13);', 
        @UpdateStatment = @UpdateStatment   
        'UPDATE T SET ERROR'   CAST(S1.RowNum AS NVARCHAR)   '=  S.ErrorId 
        FROM #Temp T 
                INNER JOIN #SampleData S ON T.machineId = S.machineId AND T.periodId = S.periodId WHERE S.RowNum = '   CAST(S1.RowNum AS NVARCHAR)   '; '
 FROM #SampleData S1 INNER JOIN 
           (SELECT [machineId], [periodId] 
            FROM #SampleData 
            GROUP BY [machineId], [periodId] 
            HAVING MAX(RowNum) >= ALL (SELECT RowNum FROM #SampleData) ) AS S2 ON S1.[machineId] = S2.[machineId] AND S1.[periodId]  = S2.[periodId] 
 ORDER BY S1.RowNum

 SELECT @SQLString
 SELECT @UpdateStatment

 EXEC SP_executesql @SQLString
 EXEC SP_executesql @UpdateStatment
 
 SELECT * FROM #Temp

--DROP TABLE #Temp
--DROP TABLE #SampleData

CodePudding user response：

There are a couple of ways to do this. If you have a finite number of errors then you could do something simple like this

-- static column version
SELECT * FROM 
    (
    SELECT 
            *
            , ColNum = CONCAT('Col', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) )
        FROM #sampleData 
    ) s
    PIVOT 
        (
            MIN(ErrorID) 
            FOR [ColNum] IN (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10)
        ) pvt

I've added the 'Col' string to the start to make the pivot statement more readable (otherwise the column names would start with numbers).

This statis version will always produce the same number of columns even if they are all blank.

If you want it fully dynamic then you you will have to generate a list of the column names first, the final statement is almost the same as above.

-- dynamic version
-- Get a list of column names required
DECLARE @ColNums varchar(1000) = ''
SELECT  @ColNums = CONCAT(@ColNums, ColNum , ', ')
     FROM (SELECT DISTINCT ColNum FROM (
                SELECT 
                        *
                        , ColNum = CONCAT('Col', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) ) --column name per machine/period
                    FROM #sampleData ) c
            ) x

SET @ColNums = LEFT(@ColNums, LEN(@ColNums)-1) -- trim the final comma

-- Generate the final sql
DECLARE @sql varchar(max) = 
'SELECT * FROM 
    (SELECT *, ColNum = CONCAT(''Col'', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) )
        FROM #sampleData ) s
    PIVOT 
        (MIN(ErrorID) FOR [ColNum] IN ('   @ColNums   ')) pvt 
    ORDER BY MachineID, PeriodID'

EXEC (@sql)

Note: The dynamic version also formats the ColNums so the numeric part is always 3 numbers long, this assumes you won't have more than 999 error columns !! If you don't need this much padding, then you can reduce the D3 to D2 in the format function.

This gives the following results