Home > Software design >  Flatten table according to unique values of identifier
Flatten table according to unique values of identifier

Time:04-20

I have a table that looks like

id1 id2 id3 value1 value2 ...
1 1 1 x11 x12 ---
1 1 2 x21 x22 ---
2 1 1 x31 x32 ---
2 1 2 x41 x42 ---
2 1 3 x51 x52 ---

where each combination of id1-id2-id3 identifies a unique object.

I want to flatten this table with respect to the values of id3, with appropriately renamed columns. More precisely, I would like to obtain a table of the form

id1 id2 value1_id3=1 value2_id3=1 --- value1_id3=2 value2_id3=2 --- value1_id3=3 value2_id3=3 ---
1 1 x11 x12 --- x21 x22 ---
2 1 x31 x32 --- x41 x42 --- x51 x52 ---

Now each combination of id1-id2 identifies a unique object.

Is there someone that knows how something like this can be achieved in SQL?

Update: it would be convenient to have it work for an arbitrary number of value columns.

CodePudding user response:

You can use the CASE statement to extract the value from the relative columns, then apply the aggregation function STRING_AGG to merge the rows belonging to the same combination of id1, id2:

SELECT 
    id1,
    id2, 
    STRING_AGG(CASE WHEN id3=1 THEN value1 ELSE NULL END, '') AS value1_id3eq1,
    STRING_AGG(CASE WHEN id3=1 THEN value2 ELSE NULL END, '') AS value2_id3eq1,
    STRING_AGG(CASE WHEN id3=2 THEN value1 ELSE NULL END, '') AS value1_id3eq2,
    STRING_AGG(CASE WHEN id3=2 THEN value2 ELSE NULL END, '') AS value2_id3eq2,
    STRING_AGG(CASE WHEN id3=3 THEN value1 ELSE NULL END, '') AS value1_id3eq3,
    STRING_AGG(CASE WHEN id3=3 THEN value2 ELSE NULL END, '') AS value2_id3eq3
FROM 
    tab
GROUP BY 
    id1,
    id2

Here's a fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=386ff0b386c6a90bb1be19c5b4e9331e.

CodePudding user response:

As I mentioned in the comments, you'll need to use dynamic SQL if you have an arbitrary number of values for id3 and columns (I assumed you meant both).

This, in truth, makes for an ugly query. I also assume you are on a fully supported version of SQL Server. If not and thus you can't use STRING_AGG you'll need to use the "old" FOR XML PATH (and STUFF) method (I do not cover that here, as there are plenty of dupes out there, such as my own answer here).

I, honestly, am not going to explain what this does for the following reasons:

  1. Dynamic SQL is a powerful and dangerous tool; if you don't understand the following, you should not be using it.
  2. This should likely be a task for your presentation layer
  3. You have your "best friend" to help you debug and help you see what the actual SQL that is being run:
--Create sample table
CREATE TABLE dbo.YourTable (id1 int,
                            id2 int,
                            id3 int,
                            value1 char(3),
                            value2 char(3));

GO
--Insert sample data
INSERT INTO dbo.YourTable
VALUES (1,1,1,'x11','x12'),
       (1,1,2,'x21','x22'),
       (2,1,1,'x31','x32'),
       (2,1,2,'x41','x42'),
       (2,1,3,'x51','x52');
GO
--Solution
DECLARE @SQL nvarchar(MAX),
        @CRLF nchar(2) = NCHAR(13)   NCHAR(10);

DECLARE @Delim nchar(3) = ','   @CRLF;

SET @SQL = N'SELECT YT.id1,'   @CRLF   
           N'       YT.id2,'   @CRLF   
           (SELECT STRING_AGG(N'       MAX(CASE id3 WHEN '   QUOTENAME(yt.id3,'''')   N' THEN YT.'   QUOTENAME(c.[name])   N' END) AS '   QUOTENAME(CONCAT(c.name,'_id3=',yt.id3)),@Delim) WITHIN GROUP (ORDER BY yt.id3,c.[name])
            --Note, due to string ordering, value2 will have a GREATER value than value10
            FROM (SELECT DISTINCT id3
                  FROM dbo.YourTable) yt
                 CROSS JOIN (SELECT c.[name]
                             FROM sys.schemas s
                                  JOIN sys.tables t ON s.schema_id = t.schema_id
                                  JOIN sys.columns c ON t.object_id = c.object_id
                             WHERE s.[name] = N'dbo' --This would be your schema
                               AND t.[name] = N'YourTable' --This would be your table
                               AND c.[name] LIKE 'value%') c)   @CRLF  
           N'FROM dbo.YourTable YT'   @CRLF   
           N'GROUP BY YT.id1, YT.id2;';

--PRINT @SQL; --Your best friend
EXEC sys.sp_executesql @SQL;

GO
--Clean up
DROP TABLE dbo.YourTable;
GO
  • Related