I'm trying to calculate the cumulative percentage for each row Presto/Athena. Eg: If I have data like this
AccountID | UserID | HolidaysTaken
ABC | A | 4
ABC | B | 6
ABC | B | 3
ABC | K | 2
ABC | K | 3
ABC | X | 1
Now after running this query I get the below result.
SELECT AccountID, UserID, sum(HolidaysTaken) AS HolidaysTaken FROM table
WHERE AccountID = 'ABC'
GROUP BY AccountID, UserID
ORDER BY HolidaysTaken DESC
AccountID | UserID | HolidaysTaken
ABC | B | 9
ABC | K | 5
ABC | A | 4
ABC | X | 1
Total holiday taken by all users = 19
But I want to add 2 more columns.
EachUserPercentage
: The Percentage of holidays taken by each user from total holidays.
CumulativePercentage
: Cumulative sum of EachUserPercentage. This one I can do using this post
AccountID | UserID | HolidaysTaken | EachUserPercentage | CumulativePercentage
ABC | B | 9 | 47.36 | 47.36
ABC | K | 5 | 26.31 | 73.67
ABC | A | 4 | 21.05 | 94.72
ABC | X | 1 | 5.26 | 100
I tried diff window function percent_rank(), cume_dist() and ntile()
but can't get proper EachUserPercentage
working.
CodePudding user response:
You can use window functions to find the percentage for an AccountID, then another window function to sum this over rows unbound preceding ordered by the total holidays taken per UserID. Something like as follows:
WITH totalUser
AS (SELECT AccountID
,UserID
,SUM(HolidaysTaken) AS HolidaysTaken
,CAST(100.0 * SUM(HolidaysTaken) / SUM(SUM(HolidaysTaken)) OVER (PARTITION BY AccountID) AS NUMERIC(5, 2)) AS EachUserPercentage
FROM table
WHERE AccountID = 'ABC'
GROUP BY AccountID
,UserID)
SELECT totalUser.AccountID
,totalUser.UserID
,totalUser.HolidaysTaken
,totalUser.EachUserPercentage
,SUM(totalUser.EachUserPercentage) OVER (PARTITION BY totalUser.AccountID
ORDER BY totalUser.EachUserPercentage DESC
ROWS UNBOUNDED PRECEDING)
FROM totalUser
ORDER BY totalUser.HolidaysTaken DESC;
CodePudding user response:
Hi you can get simply EachUserPercentage from below query if your group is on AccountID(considered).
SELECT table.AccountID, UserID, sum(table.HolidaysTaken) AS HolidaysTaken,
MAX(CAST(all_sum.HolidaysTaken AS NUMERIC(12,2))),
(SUM(CAST(table.HolidaysTaken AS NUMERIC(12,2)))/MAX(CAST(all_sum.HolidaysTaken AS NUMERIC(12,2))))*100 EachUserPercentage
FROM table
LEFT OUTER JOIN (SELECT SUM(HolidaysTaken) AS HolidaysTaken,AccountID FROM table GROUP BY AccountID)all_sum ON all_sum.AccountID= table.AccountID
WHERE table.AccountID = 'ABC'
GROUP BY table.AccountID, UserID
ORDER BY HolidaysTaken DESC
It works at my end.