Is there a better way to get the Site Names that represent the top 25% of total cases. My approach worked but I wonder if there is a better way. I used =SUM(B3:B426)/4 which resulted in 281.5, then I added 0 44 to get 44. Then I added 44 39 to get 83. I repeated the process until I got close to 281.5. Any Suggestions Is appreciated.
Row Labels | Count of Site Name | 281.5 |
---|---|---|
Site - 470 | 44 | 44 |
Site - 316 | 39 | 83 |
Site - 222 | 38 | 121 |
Site - 496 | 34 | 155 |
Site - 279 | 20 | 175 |
Site - 435 | 16 | 191 |
Site - 335 | 16 | 207 |
Site - 507 | 15 | 222 |
Site - 301 | 15 | 237 |
Site - 413 | 14 | 251 |
Site - 542 | 13 | 264 |
Site - 473 | 12 | 276 |
Site - 469 | 12 | 288 |
Site - 136 | 12 | |
Site - 506 | 11 | |
Site - 498 | 10 | |
Site - 427 | 10 | |
Site - 277 | 9 | |
Site - 522 | 8 | |
Site - 424 | 8 | |
Site - 228 | 8 | |
Site - 233 | 8 | |
Site - 275 | 8 | |
Site - 141 | 8 | |
Site - 494 | 7 | |
Site - 230 | 7 | |
Site - 208 | 7 | |
Site - 253 | 7 | |
Site - 439 | 7 | |
Site - 366 | 7 | |
Site - 151 | 7 | |
Site - 520 | 6 |
CodePudding user response:
There's functions in Excel O365 that make this simpler. If you don't, then:
As you have done, make column 3 a running total of the counts, but just continue down to the last value instead of stopping when you're close.
The formula
=INDEX(B3:B426, MATCH( SUM(B3:B426)*0.25 , C3:C426, -1 ))
will scan down your accumulated total in Column C and find the value that is equal, or the next higher value than the 25% mark. It will then return the actual count from Column B of that next higher amount.So I can make a column D that is
=IF(C3 >= INDEX(B3:B426, MATCH( SUM(B3:B426)*0.25 , C3:C426, -1 )), C3, "")
and copy that down.
This approach seems a little more complicated than needed, but that's because I assume you want to include the final site the tripped over the 25% mark. If not, the logic gets even easier.
If you have O365 there's another single-formula approach, I'll add it if you need it.
CodePudding user response:
I use as a reference 75% (cell E1
) in order to select more rows for illustration purpose. On cell D3
put the following formula:
=LET(percentage, E1, cumSum, SCAN(0, B2:B10, LAMBDA(a,b,a b)), topIdx,
XMATCH(MAX(cumSum)*percentage, cumSum, -1),
A2:INDEX($A$2:$B$10, topIdx, 2))
where MAX(cumSum)*percentage
represents the 75% of the total cumulative sum. For illustration purpose I am considering only the fist 9
rows of the input sample.
Here is the output:
which selects the first 5
rows, because the output of XMATCH
is 5
which represents the position of the cumulative sum of: 175
. The 75% of the cumulative sum is: 177.75
so up to row 5
is the closed cumulative lower or equal than 177.75
. The third input argument (match_mode
) of XMATCH
-1
indicates the position of exact match or next smallest item, i.e. the closest from the top to down, if you want the closest from bottom up then you need to set it to 1
.
The name cumSum
, represents the cumulative sum of Count of Site Name as follow (check SCAN documentation for more information):
SCAN(0, B2:B10, LAMBDA(a,b,a b))
so there is no need to have a helper column for this intermediate calculation under Office 365.
To select the first x-rows (topIdx
) is achieved via the following formula:
A2:INDEX($A$2:$B$10, topIdx, 2)
LET
function is used to avoid repetition of the same elements in the formula and also to identify intermediate results. You can test each one of them replacing the last calculation of the formula with the specific name you would like to verify the output. For example replacing: A2:INDEX($A$2:$B$10, topIdx, 2)
with topIdx
, will return this value.