Home > Back-end >  How to get cumulative sum of a calculated column ordered in ascending order within each group in SQL
How to get cumulative sum of a calculated column ordered in ascending order within each group in SQL

Time:08-19

I have the following table in Teradata:

group1 group2
A A
A B
A B
A C
A C
A C
A C
B C
B A
B A
B A
B D
CREATE VOLATILE TABLE student
(
group1 varchar(10),
group2 varchar(10)
)
NO PRIMARY INDEX
ON COMMIT PRESERVE ROWS;

INSERT INTO student values ('A','A');
INSERT INTO student values ('A','B');
INSERT INTO student values ('A','B');
INSERT INTO student values ('A','C');
INSERT INTO student values ('A','C');
INSERT INTO student values ('A','C');
INSERT INTO student values ('A','C');
INSERT INTO student values ('B','A');
INSERT INTO student values ('B','A');
INSERT INTO student values ('B','A');
INSERT INTO student values ('B','A');
INSERT INTO student values ('B','D');

I am trying to get a table as follows, where the percentage column should be ordered ascending before cumulative sum:

group1 group2 count_both_groups group1_count percentage cumsum
A A 1 7 0.143 0.143
A B 2 7 0.286 0.429
A C 4 7 0.571 1.000
B D 1 5 0.200 0.200
B A 4 5 0.800 1.000

I can get all columns right expect the last one with:

 SELECT 
    group1,
    group2,
    COUNT (*) count_both_groups,
    SUM (count_both_groups) OVER (PARTITION BY group1) AS group1_count,
    (0.000 count_both_groups)/group1_count as percentage
    FROM student
    GROUP BY group1,group2
    order BY group1,percentage;

To add last column as well, I try:

 SELECT 
    group1,
    group2,
    COUNT (*) count_both_groups,
    SUM (count_both_groups) OVER (PARTITION BY group1) AS group1_count,
    (0.000 count_both_groups)/group1_count as percentage,
    sum (percentage) over (partition by group1 order by count_both_groups asc ROWS UNBOUNDED PRECEDING) AS cumsum
    FROM student
    GROUP BY group1,group2;

I get the error:

Ordered Analytical Functions can not be nested. 

CodePudding user response:

First group by group1 and group2 to get the column count_both_groups and use SUM() window function to get group1_count and by dividing them get percentage.
Finally use SUM() window function again to get the cumulative sum of percentage:

WITH cte AS (
  SELECT *,
         COUNT(*) count_both_groups,
         SUM(COUNT(*)) OVER (PARTITION BY group1) group1_count,
         1.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY group1) percentage
  FROM student
  GROUP BY group1, group2
)
SELECT *, SUM(percentage) OVER (PARTITION BY group1 ORDER BY percentage) cumsum
FROM cte;

See the demo (works in MySql and other databases).

CodePudding user response:

You can do this in a single pass like so, I think:

  SELECT 
group1,
group2,
COUNT (*) count_both_groups,
SUM (count_both_groups) OVER (PARTITION BY group1) AS group1_count,
(0.000 count_both_groups)/group1_count as percentage,
sum (0.000  count_both_groups) over (partition by group1 order by count_both_groups asc ROWS UNBOUNDED PRECEDING)/group1_count cum_sum
FROM student
GROUP BY group1,group2
order BY group1,cum_sum;
  • Related