Home > Software engineering >  Pivot/Conditional aggregation data with count and SUM In t-Sql Query
Pivot/Conditional aggregation data with count and SUM In t-Sql Query

Time:06-14

Let's say we have data as below

EmpId   Stages  Amount
101     Stg1    10.2
101     Stg2    10.22
101     Stg1    10.11
101     Stg3    6.21
101     Stg2    3.22
102     Stg1    3.23
102     Stg2    2.22
102     Stg3    1.22
102     Stg3    3.22

And the result required: Each employee wise stages count and stage wise sum amount as below

EmpId   Stg1-Count  Amount-stg1 Stg2-Count  Amount-stg2 Stg3-Count  Amount-stg3
101         2       20.31           2           13.44       1           6.21
102         1       3.23            1           2.22        2           4.44

CodePudding user response:

Here is an approach that uses the straightforward syntax of conditional aggregations.

/* Set up the sample data */
DROP TABLE IF EXISTS #RawData;
CREATE TABLE #RawData
 (
     EmpId INT,
     Stages NVARCHAR(5),
     Amount DECIMAL(9, 4)
 );
INSERT INTO #RawData
VALUES (101, N'Stg1', 10.2),
       (101, N'Stg2', 10.22),
       (101, N'Stg1', 10.11),
       (101, N'Stg3', 6.21),
       (101, N'Stg2', 3.22),
       (102, N'Stg1', 3.23),
       (102, N'Stg2', 2.22),
       (102, N'Stg3', 1.22),
       (102, N'Stg3', 3.22),
       (103, N'Stg1', 1.55);

I added a row to ensure we're handling incomplete and in-process stage progressions.

This assumes that there are a fixed number of "stages" or categories that will be counted and summed. If there are a large or changing number of values, this can become tedious to set up, but the code is pretty straightforward:

SELECT d.EmpId,
       SUM(CASE WHEN d.Stages = 'Stg1' THEN 1 ELSE 0 END)        AS [Stg1-Count],
       SUM(CASE WHEN d.Stages = 'Stg1' THEN ISNULL(d.Amount, 0) ELSE 0 END) AS [Amount-Stg1],
       SUM(CASE WHEN d.Stages = 'Stg2' THEN 1 ELSE 0 END)        AS [Stg2-Count],
       SUM(CASE WHEN d.Stages = 'Stg2' THEN ISNULL(d.Amount, 0) ELSE 0 END) AS [Amount-Stg2],
       SUM(CASE WHEN d.Stages = 'Stg3' THEN 1 ELSE 0 END)        AS [Stg3-Count],
       SUM(CASE WHEN d.Stages = 'Stg3' THEN ISNULL(d.Amount, 0) ELSE 0 END) AS [Amount-Stg3]
  FROM #RawData AS d
 GROUP BY d.EmpId;
EmpId Stg1-Count Amount-Stg1 Stg2-Count Amount-Stg2 Stg3-Count Amount-Stg3
101 2 20.3100 2 13.4400 1 6.2100
102 1 3.2300 1 2.2200 2 4.4400
103 1 1.5500 0 0.0000 0 0.0000

It uses a GROUP BY on the EmpId to aggregate everything per employee, and CASEs are evaluated to construct the additional columns. Using a SUM rather than a COUNT provides for a conditional counting, otherwise we'd just get the count of all rows for that employee regardless of the 'Stage' value.

CodePudding user response:

One option for this is to use dynamic SQL and PIVOT to get a more elastic result set.

Using an adapted version of your result set that includes some new 'Stages' values

/* Set up the sample data */
DROP TABLE IF EXISTS #RawData;

CREATE TABLE #RawData
     (
         EmpId INT,
         Stages NVARCHAR(5),
         Amount DECIMAL(9, 4)
     );

DECLARE @SqlStatement NVARCHAR(MAX);

INSERT INTO #RawData
VALUES (101, N'Stg1', 10.2),
       (101, N'Stg2', 10.22),
       (101, N'Stg1', 10.11),
       (101, N'Stg3', 6.21),
       (101, N'Stg2', 3.22),
       (102, N'Stg1', 3.23),
       (102, N'Stg2', 2.22),
       (102, N'Stg3', 1.22),
       (102, N'Stg3', 3.22),
       (103, N'Stg0', 4.00),
       (103, N'Stg1', 1.55),
       (103, N'Stg4', 14.00);

Next, we'll need some constructs for working with those data in dynamic SQL:

/* Set up some variables to store comma-delimited values and column names */
    DECLARE @stages      NVARCHAR(255),
            @amountAlias NVARCHAR(255),
            @countAlias  NVARCHAR(255),
            @resultCols  NVARCHAR(255);

/* Store comma-delimited column names and selection aliases in variables for use in the dynamic blocks */
SELECT @stages      = STRING_AGG(d.Stages, ','),
       @amountAlias = STRING_AGG(CONCAT(N'ISNULL(a.', d.Stages, N', 0) AS [Amount-', d.Stages, N']'), ', '),
       @countAlias  = STRING_AGG(CONCAT(N'ISNULL(c.', d.Stages, N', 0) AS [', d.Stages, N'-Count]'), ', '),
       @resultCols  = STRING_AGG(CONCAT(N'c.[', d.Stages, N'-Count], a.[Amount-', d.Stages, N']'), ', ')
  FROM (SELECT DISTINCT Stages FROM #RawData) AS d;

Then, it's a matter of constructing the dynamic code using those variables:

SELECT @SqlStatement = N';WITH Amounts AS (SELECT a.EmpId,'   @amountAlias
                         N'
  FROM (SELECT EmpId, Stages, ISNULL(Amount, 0) AS Amount FROM #RawData) AS src
  PIVOT (SUM(src.Amount)
FOR src.Stages IN ('     @stages   N') ) AS a),
    Counts AS (SELECT c.EmpId,'   @countAlias
                         N'
  FROM (SELECT EmpId, Stages, ISNULL(Amount, 0) AS Amount FROM #RawData) AS src
  PIVOT (COUNT(src.Amount)
FOR src.Stages IN ('     @stages   N') ) AS c)
SELECT a.EmpId, '        @resultCols   N'
  FROM Amounts AS a
  JOIN Counts AS c
    ON a.EmpId = c.EmpId;';

Finally, you can execute the dynamic SQL using EXEC (@SqlStatement);

In this example, there are two CTEs defined using PIVOT operations: one for the amount aggregate, and the other for the counts. The column-name strings are constructed using the distinct 'Stages' values, and are aliased with the associated aggregation-type names.

This solution has the advantage of adjusting to and returning new 'Stages' values, but is more complicated and generates a more complex execution plan, more table scans, more logical reads, and at a larger scale will run into cardinality estimation limitations.

| EmpId | Stg0-Count | Amount-Stg0 | Stg1-Count | Amount-Stg1 | Stg2-Count | Amount-Stg2 | Stg3-Count | Amount-Stg3 | Stg4-Count | Amount-Stg4 |
|-------|------------|-------------|------------|-------------|------------|-------------|------------|-------------|------------|-------------|
|   101 |          0 |      0.0000 |          2 |     20.3100 |          2 |     13.4400 |          1 |      6.2100 |          0 |      0.0000 |
|   102 |          0 |      0.0000 |          1 |      3.2300 |          1 |      2.2200 |          2 |      4.4400 |          0 |      0.0000 |
|   103 |          1 |      4.0000 |          1 |      1.5500 |          0 |      0.0000 |          0 |      0.0000 |          1 |     14.0000 |
  • Related