I am attempting to create an SQL query which will compress multidimentional data using PIVOT, but need to do some operations on the pivoted data.
I have coding experience but very little experience with SQL and am using SQLPathfinder3 to create and execute queries, so what I am looking for is top level approach - what order of operations should I be going for here to achieve the desired result?
An example of my initial result table, gotten with SELECT, is as follows:
Item | Property | Subproperty | Thing |
---|---|---|---|
item1 | propertyA | subproperty1 | value1_A_1 |
item1 | propertyA | subproperty2 | value1_A_2 |
item1 | propertyB | subproperty1 | value1_B_1 |
item1 | propertyB | subproperty2 | value1_B_2 |
item2 | propertyA | subproperty1 | value2_A_1 |
item2 | propertyA | subproperty2 | value2_A_2 |
item2 | propertyB | subproperty1 | value2_B_1 |
item2 | propertyB | subproperty2 | value2_B_2 |
By executing one pivot, assigning Subproperty as the CTHEADER, and Thing as the CTVALUE, I can create this:
Item | Property | Subproperty1 | Subproperty2 |
---|---|---|---|
item1 | propertyA | value1_A_1 | value1_A_2 |
item1 | propertyB | value1_B_1 | value1_B_2 |
item2 | propertyA | value2_A_1 | value2_A_2 |
item2 | propertyB | value2_B_1 | value2_B_2 |
Similarly, I can pivot around "Subproperty" to get something analogous:
Item | Subproperty | Property A | Property B |
---|---|---|---|
item1 | subproperty1 | value1_A_1 | value1_A_2 |
item1 | subproperty2 | value1_B_1 | value1_B_2 |
item2 | subproperty1 | value2_A_1 | value2_A_2 |
item2 | subproperty2 | value2_B_1 | value2_B_2 |
The final result that I need is this - a combination of the above values:
Item | Property A | Property B |
---|---|---|
item1 | value1_A_1 value1_A_2 | value1_B_1 value1_B_2 |
item2 | value2_A_1 value2_A_2 | value2_B_1 value2_B_2 |
I don't know how to go about doing this in a single query, but I would very much like to, because this is something that needs to be automated.
I can't simply SELECT the sums that I want, because those values are on four separate lines. I could workaround in excel or python, especially if I can effect the result below, but I'm not sure how to do this.
Item | PropA Sub1 | PropA Sub2 | PropB Sub1 | PropB Sub2 |
---|---|---|---|---|
item1 | value1_A_1 | value1_A_2 | value1_B_1 | value1_B_2 |
item2 | value2_A_1 | value2_A_2 | value2_B_1 | value2_B_2 |
Any help, or pointers towards educational material would be appreciated, thanks.
CodePudding user response:
Try:
SELECT
P.Item,
STRING_AGG(P.propertyA, ' ') WITHIN GROUP (ORDER BY Subproperty) AS propertyA,
STRING_AGG(P.propertyB, ' ') WITHIN GROUP (ORDER BY Subproperty) AS propertyB
FROM @Data
PIVOT (
MAX(Thing)
FOR Property IN (propertyA, propertyB)
) P
GROUP BY P.Item
ORDER BY P.Item
See this db<>fiddle.
The initial PIVOT
will yield:
Item | Subproperty | propertyA | propertyB |
---|---|---|---|
item1 | subproperty1 | value1_A_1 | value1_B_1 |
item2 | subproperty1 | value2_A_1 | value2_B_1 |
item1 | subproperty2 | value1_A_2 | value1_B_2 |
item2 | subproperty2 | value2_A_2 | value2_B_2 |
and the GROUP BY
and STRING_AGG()
will yield the desired result:
Item | propertyA | propertyB |
---|---|---|
item1 | value1_A_1 value1_A_2 | value1_B_1 value1_B_2 |
item2 | value2_A_1 value2_A_2 | value2_B_1 value2_B_2 |
The WITHIN GROUP(...)
clause is not actually required, but guarantees consistent results.
ADDENDUM: For the case where a sum of numbers is need, the STRING_AGG() WITHIN GROUP()
can be replaced with a simple SUM()
. Using SUM()
instead of MAX()
in the pivot is also recommended.
-- Pivot plus group by aggregation (Sum of numeric values)
SELECT
P.Item,
SUM(P.propertyA) AS propertyA,
SUM(P.propertyB) AS propertyB
FROM @Data
PIVOT (
SUM(Thing)
FOR Property IN (propertyA, propertyB)
) P
GROUP BY P.Item
ORDER BY P.Item
The Query can be further simplified to a plain old PIVOT
by eliminating unwanted columns from the initial select that feeds the PIVOT
.
-- Using just a regular PIVOT after excluding unwanted columns
SELECT P.Item, P.propertyA, P.propertyB
FROM (SELECT Item, Property, Thing FROM @Data) D
PIVOT (
SUM(Thing)
FOR Property IN (propertyA, propertyB)
) P
Lastly (I hope), a technique called "Conditional Aggregation" can be used in place of the PIVOT
. Sometimes this is actually easier to read, understand, and maintain.
-- Using conditional aggregation instead of PIVOT
SELECT
D.Item,
SUM(CASE WHEN D.Property = 'propertyA' THEN D.Thing END) AS propertyA,
SUM(CASE WHEN D.Property = 'propertyB' THEN D.Thing END) AS propertyB
FROM @Data D
GROUP BY D.Item
ORDER BY D.Item
See this db<>fiddle for the above techniques.
CodePudding user response:
I would choose a dynamic approach, as property will have morwe than 2 item i guess.
I choose to use integers for thig as you wante4d sums
CREATE TABLE tab1
([Item] varchar(5), [Property] varchar(9), [Subproperty] varchar(12), [Thing] int)
;
INSERT INTO tab1
([Item], [Property], [Subproperty], [Thing])
VALUES
('item1', 'propertyA', 'subproperty1', 1),
('item1', 'propertyA', 'subproperty2', 2),
('item1', 'propertyB', 'subproperty1', 3),
('item1', 'propertyB', 'subproperty2', 4),
('item2', 'propertyA', 'subproperty1', 5),
('item2', 'propertyA', 'subproperty2', 6),
('item2', 'propertyB', 'subproperty1', 7),
('item2', 'propertyB', 'subproperty2', 8)
;
8 rows affected
DECLARE @cols AS NVARCHAR(MAX),
@query AS NVARCHAR(MAX);
SET @cols = (SELECT STRING_AGG([Property],',') FROM (SELECT DISTINCT [Property] FROM tab1 WHERE [Property] IS NOT NULL)t);
set @query = '
SELECT [Item], ' @cols ' from
(
SELECT
[Item], [Property], SUM([Thing]) as SumThing
FROM tab1
GROUP BY [Item], [Property]
) x
pivot
(
SUM(SumThing)
for [Property] in (' @cols ')
) p ';
execute(@query);
Item | propertyA | propertyB |
---|---|---|
item1 | 3 | 7 |
item2 | 11 | 15 |