SQL - applying multiple pivots in one query, and finding the sum of resulting rows-CodePudding

I am attempting to create an SQL query which will compress multidimentional data using PIVOT, but need to do some operations on the pivoted data.

I have coding experience but very little experience with SQL and am using SQLPathfinder3 to create and execute queries, so what I am looking for is top level approach - what order of operations should I be going for here to achieve the desired result?

An example of my initial result table, gotten with SELECT, is as follows:

Item	Property	Subproperty	Thing
item1	propertyA	subproperty1	value1_A_1
item1	propertyA	subproperty2	value1_A_2
item1	propertyB	subproperty1	value1_B_1
item1	propertyB	subproperty2	value1_B_2
item2	propertyA	subproperty1	value2_A_1
item2	propertyA	subproperty2	value2_A_2
item2	propertyB	subproperty1	value2_B_1
item2	propertyB	subproperty2	value2_B_2

By executing one pivot, assigning Subproperty as the CTHEADER, and Thing as the CTVALUE, I can create this:

Item	Property	Subproperty1	Subproperty2
item1	propertyA	value1_A_1	value1_A_2
item1	propertyB	value1_B_1	value1_B_2
item2	propertyA	value2_A_1	value2_A_2
item2	propertyB	value2_B_1	value2_B_2

Similarly, I can pivot around "Subproperty" to get something analogous:

Item	Subproperty	Property A	Property B
item1	subproperty1	value1_A_1	value1_A_2
item1	subproperty2	value1_B_1	value1_B_2
item2	subproperty1	value2_A_1	value2_A_2
item2	subproperty2	value2_B_1	value2_B_2

The final result that I need is this - a combination of the above values:

Item	Property A	Property B
item1	value1_A_1 value1_A_2	value1_B_1 value1_B_2
item2	value2_A_1 value2_A_2	value2_B_1 value2_B_2

I don't know how to go about doing this in a single query, but I would very much like to, because this is something that needs to be automated.

I can't simply SELECT the sums that I want, because those values are on four separate lines. I could workaround in excel or python, especially if I can effect the result below, but I'm not sure how to do this.

Item	PropA Sub1	PropA Sub2	PropB Sub1	PropB Sub2
item1	value1_A_1	value1_A_2	value1_B_1	value1_B_2
item2	value2_A_1	value2_A_2	value2_B_1	value2_B_2

Any help, or pointers towards educational material would be appreciated, thanks.

CodePudding user response：

Try:

SELECT
    P.Item,
    STRING_AGG(P.propertyA, '   ') WITHIN GROUP (ORDER BY Subproperty) AS propertyA,
    STRING_AGG(P.propertyB, '   ') WITHIN GROUP (ORDER BY Subproperty) AS propertyB
FROM @Data
PIVOT (
    MAX(Thing)
    FOR Property IN (propertyA, propertyB)
) P
GROUP BY P.Item
ORDER BY P.Item

See this db<>fiddle.

The initial PIVOT will yield:

Item	Subproperty	propertyA	propertyB
item1	subproperty1	value1_A_1	value1_B_1
item2	subproperty1	value2_A_1	value2_B_1
item1	subproperty2	value1_A_2	value1_B_2
item2	subproperty2	value2_A_2	value2_B_2

and the GROUP BY and STRING_AGG() will yield the desired result:

Item	propertyA	propertyB
item1	value1_A_1 value1_A_2	value1_B_1 value1_B_2
item2	value2_A_1 value2_A_2	value2_B_1 value2_B_2

The WITHIN GROUP(...) clause is not actually required, but guarantees consistent results.

ADDENDUM: For the case where a sum of numbers is need, the STRING_AGG() WITHIN GROUP() can be replaced with a simple SUM(). Using SUM() instead of MAX() in the pivot is also recommended.

-- Pivot plus group by aggregation (Sum of numeric values)
SELECT
    P.Item,
    SUM(P.propertyA) AS propertyA,
    SUM(P.propertyB) AS propertyB
FROM @Data
PIVOT (
    SUM(Thing)
    FOR Property IN (propertyA, propertyB)
) P
GROUP BY P.Item
ORDER BY P.Item

The Query can be further simplified to a plain old PIVOT by eliminating unwanted columns from the initial select that feeds the PIVOT.

-- Using just a regular PIVOT after excluding unwanted columns
SELECT P.Item, P.propertyA, P.propertyB
FROM (SELECT Item, Property, Thing FROM @Data) D
PIVOT (
    SUM(Thing)
    FOR Property IN (propertyA, propertyB)
) P

Lastly (I hope), a technique called "Conditional Aggregation" can be used in place of the PIVOT. Sometimes this is actually easier to read, understand, and maintain.

-- Using conditional aggregation instead of PIVOT
SELECT
    D.Item,
    SUM(CASE WHEN D.Property = 'propertyA' THEN D.Thing END) AS propertyA,
    SUM(CASE WHEN D.Property = 'propertyB' THEN D.Thing END) AS propertyB
FROM @Data D
GROUP BY D.Item
ORDER BY D.Item

See this db<>fiddle for the above techniques.

CodePudding user response：

I would choose a dynamic approach, as property will have morwe than 2 item i guess.

I choose to use integers for thig as you wante4d sums

CREATE TABLE tab1
    ([Item] varchar(5), [Property] varchar(9), [Subproperty] varchar(12), [Thing] int)
;
    
INSERT INTO tab1
    ([Item], [Property], [Subproperty], [Thing])
VALUES
    ('item1', 'propertyA', 'subproperty1', 1),
    ('item1', 'propertyA', 'subproperty2', 2),
    ('item1', 'propertyB', 'subproperty1', 3),
    ('item1', 'propertyB', 'subproperty2', 4),
    ('item2', 'propertyA', 'subproperty1', 5),
    ('item2', 'propertyA', 'subproperty2', 6),
    ('item2', 'propertyB', 'subproperty1', 7),
    ('item2', 'propertyB', 'subproperty2', 8)
;

8 rows affected

DECLARE @cols AS NVARCHAR(MAX),
    @query  AS NVARCHAR(MAX);

SET @cols = (SELECT STRING_AGG([Property],',') FROM (SELECT DISTINCT [Property] FROM tab1 WHERE [Property] IS NOT NULL)t);

set @query = '
  
  SELECT [Item], '   @cols   ' from 
            (
                SELECT
                   [Item], [Property], SUM([Thing]) as SumThing
                 FROM tab1
                 GROUP BY [Item], [Property]
           ) x
            pivot 
            (
                 SUM(SumThing)
                for [Property] in ('   @cols   ')
            ) p ';

execute(@query);

Item	propertyA	propertyB
item1	3	7
item2	11	15

fiddle