Home > Software design >  How to pivot data without aggregations and with variable amount of columns?
How to pivot data without aggregations and with variable amount of columns?

Time:01-08

I have the follwing table:

machineId periodId errorId
AGR0.37576 13 ERR561
AGR0.37576 14 ERR561
AGR0.37576 14 ERR458
AGR0.37576 15 ERR561
AGR0.37576 15 ERR458
AGR0.37576 16 ERR458
AGR0.37576 16 ERR561
AGR0.37576 17 ERR561
AGR0.37576 17 ERR458
AGR0.37576 18 ERR458
AGR0.37576 18 ERR561
AGR0.37576 19 ERR561
AGR0.37576 20 ERR561
  • "machineId" is a uniqueidentifier that identifies the machine we have at hand [AGR(agriculture)0(a tracktor).37576(the SN of the machine itself)]

  • "periodId" is a uniqueidentifier for the period: e.g. 13 (2021-12-31), 14 (2022-01-01), etc.

  • "errorId" is a uniqueidentidier for the error the machine has had in that period. e.g. ERR561 - overheating, ERR457 - coolant level very low, ERR458 - coolant level very low It can be one or more errors per period. If it is more thane one error a new record with the same period and the new error code is inserted.

I want to pivot and present the table like so:

machineId periodId ERROR1 ERROR2 ERROR3 ERROR4
AGR7.00012 9 ERR221 MIS061 ERG737 SER003
AGR0.37576 13 ERR561 NULL NULL NULL
ROAD.88887 13 ERR561 NULL NULL NULL
AGR0.37576 14 ERR845 ERR561 ERR737 NULL

I have two problems:

1. All I could find until now explains more or less how to pivot sales data to some period (Qs, HYs, etc). As you can see the only column, I can aggregate on is the periodId, which does not makes sense.

2. The amount of errors per period varies, I can have no errors per machine for a given period and 100 for another. Any ideas how to achieve this using SQL?

Sample data:

DROP TABLE IF EXISTS #sampleData
CREATE TABLE #sampleData ( [machineId] varchar(18), [periodId] int, [errorId] varchar(13) )

INSERT INTO #sampleData ([machineId], [periodId], [errorId])
VALUES
( 'AGR0.37576', 13, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR458' ),
( 'AGR0.37576', 14, 'ERR737' ),
( 'AGR0.37576', 15, 'ERR561' ),
( 'AGR0.37576', 15, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR561' ),
( 'AGR0.37576', 19, 'ERR561' ),
( 'AGR0.37576', 20, 'ERR561' ),
( 'AGR0.37576', 21, 'ERR561' ),
( 'AGR0.37576', 22, 'ERR561' ),
( 'AGR0.37576', 29, 'ERR561' ),
( 'AGR0.37576', 30, 'ERR561' ),
( 'AGR0.37576', 96, 'ERR561' ),
( 'AGR0.37576', 97, 'ERR561' ),
( 'AGR0.37576', 111, 'ERR561' ),
( 'AGR0.37576', 112, 'ERR561' ),
( 'AGR0.37576', 113, 'ERR561' ),
( 'AGR7.00012', 9, 'ERR221' ),
( 'AGR7.00012', 9, 'MIS061' ),
( 'AGR7.00012', 9, 'ERG737' ),
( 'AGR7.00012', 9, 'SER003' ),
( 'ROAD.88887', 13, 'ERR561' )

CodePudding user response:

I added two columns to the SampleData table:

  1. Id: I don't know the order in your table
  2. RowNum: To know how many columns I must add to the Temp table.

In general, you can play with this script and improve it.

DROP TABLE IF EXISTS #sampleData
CREATE TABLE #sampleData ( [machineId] varchar(18), [periodId] int, [errorId] varchar(13), Id INT IDENTITY, RowNum INT)

INSERT INTO #sampleData ([machineId], [periodId], [errorId])
VALUES
( 'AGR0.37576', 13, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR561' ),
( 'AGR0.37576', 14, 'ERR458' ),
( 'AGR0.37576', 14, 'ERR737' ),
( 'AGR0.37576', 15, 'ERR561' ),
( 'AGR0.37576', 15, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR458' ),
( 'AGR0.37576', 16, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR561' ),
( 'AGR0.37576', 17, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR458' ),
( 'AGR0.37576', 18, 'ERR561' ),
( 'AGR0.37576', 19, 'ERR561' ),
( 'AGR0.37576', 20, 'ERR561' ),
( 'AGR0.37576', 21, 'ERR561' ),
( 'AGR0.37576', 22, 'ERR561' ),
( 'AGR0.37576', 29, 'ERR561' ),
( 'AGR0.37576', 30, 'ERR561' ),
( 'AGR0.37576', 96, 'ERR561' ),
( 'AGR0.37576', 97, 'ERR561' ),
( 'AGR0.37576', 111, 'ERR561' ),
( 'AGR0.37576', 112, 'ERR561' ),
( 'AGR0.37576', 113, 'ERR561' ),
( 'AGR7.00012', 9, 'ERR221' ),
( 'AGR7.00012', 9, 'MIS061' ),
( 'AGR7.00012', 9, 'ERG737' ),
( 'AGR7.00012', 9, 'SER003' ),
( 'ROAD.88887', 13, 'ERR561' )


 ;WITH T1 AS (
                SELECT [machineId], [periodId], [errorId], RowNum,
                                ROW_NUMBER() OVER (PARTITION BY [machineId], [periodId] ORDER BY Id) AS RowNumber
                FROM #sampleData
                                    )
  UPDATE T1 SET RowNum = RowNumber


 CREATE TABLE #Temp (machineId VARCHAR(18), periodId INT)

 
 INSERT INTO #Temp (machineId, periodId)
 SELECT machineId, periodId
 FROM #sampleData
 GROUP BY machineId, periodId



 DECLARE @UpdateStatment AS NVARCHAR(MAX) = ''
 DECLARE @SQLString AS NVARCHAR(MAX) = ''

 SELECT @SQLString = @SQLString   ' ALTER TABLE #Temp ADD ERROR'   CAST(S1.RowNum AS NVARCHAR)   '  VARCHAR(13);', 
        @UpdateStatment = @UpdateStatment   
        'UPDATE T SET ERROR'   CAST(S1.RowNum AS NVARCHAR)   '=  S.ErrorId 
        FROM #Temp T 
                INNER JOIN #SampleData S ON T.machineId = S.machineId AND T.periodId = S.periodId WHERE S.RowNum = '   CAST(S1.RowNum AS NVARCHAR)   '; '
 FROM #SampleData S1 INNER JOIN 
           (SELECT [machineId], [periodId] 
            FROM #SampleData 
            GROUP BY [machineId], [periodId] 
            HAVING MAX(RowNum) >= ALL (SELECT RowNum FROM #SampleData) ) AS S2 ON S1.[machineId] = S2.[machineId] AND S1.[periodId]  = S2.[periodId] 
 ORDER BY S1.RowNum

 SELECT @SQLString
 SELECT @UpdateStatment

 EXEC SP_executesql @SQLString
 EXEC SP_executesql @UpdateStatment
 
 SELECT * FROM #Temp

--DROP TABLE #Temp
--DROP TABLE #SampleData


   



CodePudding user response:

There are a couple of ways to do this. If you have a finite number of errors then you could do something simple like this

-- static column version
SELECT * FROM 
    (
    SELECT 
            *
            , ColNum = CONCAT('Col', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) )
        FROM #sampleData 
    ) s
    PIVOT 
        (
            MIN(ErrorID) 
            FOR [ColNum] IN (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10)
        ) pvt 

I've added the 'Col' string to the start to make the pivot statement more readable (otherwise the column names would start with numbers).

This statis version will always produce the same number of columns even if they are all blank.

If you want it fully dynamic then you you will have to generate a list of the column names first, the final statement is almost the same as above.

-- dynamic version
-- Get a list of column names required
DECLARE @ColNums varchar(1000) = ''
SELECT  @ColNums = CONCAT(@ColNums, ColNum , ', ')
     FROM (SELECT DISTINCT ColNum FROM (
                SELECT 
                        *
                        , ColNum = CONCAT('Col', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) ) --column name per machine/period
                    FROM #sampleData ) c
            ) x

SET @ColNums = LEFT(@ColNums, LEN(@ColNums)-1) -- trim the final comma

-- Generate the final sql
DECLARE @sql varchar(max) = 
'SELECT * FROM 
    (SELECT *, ColNum = CONCAT(''Col'', ROW_NUMBER() OVER(PARTITION BY MachineID, PeriodID ORDER BY ErrorID) )
        FROM #sampleData ) s
    PIVOT 
        (MIN(ErrorID) FOR [ColNum] IN ('   @ColNums   ')) pvt 
    ORDER BY MachineID, PeriodID'

EXEC (@sql)

Note: The dynamic version also formats the ColNums so the numeric part is always 3 numbers long, this assumes you won't have more than 999 error columns !! If you don't need this much padding, then you can reduce the D3 to D2 in the format function.

This gives the following results

enter image description here

  • Related