Home > Mobile >  How to get Sites Names that represent the top 25% volume of total cases in excel?
How to get Sites Names that represent the top 25% volume of total cases in excel?

Time:10-21

Is there a better way to get the Site Names that represent the top 25% of total cases. My approach worked but I wonder if there is a better way. I used =SUM(B3:B426)/4 which resulted in 281.5, then I added 0 44 to get 44. Then I added 44 39 to get 83. I repeated the process until I got close to 281.5. Any Suggestions Is appreciated.

Row Labels Count of Site Name 281.5
Site - 470 44 44
Site - 316 39 83
Site - 222 38 121
Site - 496 34 155
Site - 279 20 175
Site - 435 16 191
Site - 335 16 207
Site - 507 15 222
Site - 301 15 237
Site - 413 14 251
Site - 542 13 264
Site - 473 12 276
Site - 469 12 288
Site - 136 12
Site - 506 11
Site - 498 10
Site - 427 10
Site - 277 9
Site - 522 8
Site - 424 8
Site - 228 8
Site - 233 8
Site - 275 8
Site - 141 8
Site - 494 7
Site - 230 7
Site - 208 7
Site - 253 7
Site - 439 7
Site - 366 7
Site - 151 7
Site - 520 6

CodePudding user response:

There's functions in Excel O365 that make this simpler. If you don't, then:

  • As you have done, make column 3 a running total of the counts, but just continue down to the last value instead of stopping when you're close.

  • The formula =INDEX(B3:B426, MATCH( SUM(B3:B426)*0.25 , C3:C426, -1 )) will scan down your accumulated total in Column C and find the value that is equal, or the next higher value than the 25% mark. It will then return the actual count from Column B of that next higher amount.

  • So I can make a column D that is =IF(C3 >= INDEX(B3:B426, MATCH( SUM(B3:B426)*0.25 , C3:C426, -1 )), C3, "") and copy that down.

This approach seems a little more complicated than needed, but that's because I assume you want to include the final site the tripped over the 25% mark. If not, the logic gets even easier.

If you have O365 there's another single-formula approach, I'll add it if you need it.

CodePudding user response:

I use as a reference 75% (cell E1) in order to select more rows for illustration purpose. On cell D3 put the following formula:

=LET(percentage, E1, cumSum, SCAN(0, B2:B10, LAMBDA(a,b,a b)), topIdx, 
 XMATCH(MAX(cumSum)*percentage, cumSum, -1), 
 A2:INDEX($A$2:$B$10, topIdx, 2))

where MAX(cumSum)*percentage represents the 75% of the total cumulative sum. For illustration purpose I am considering only the fist 9 rows of the input sample.

Here is the output:

sample excel file

which selects the first 5 rows, because the output of XMATCH is 5 which represents the position of the cumulative sum of: 175. The 75% of the cumulative sum is: 177.75 so up to row 5 is the closed cumulative lower or equal than 177.75. The third input argument (match_mode) of XMATCH -1 indicates the position of exact match or next smallest item, i.e. the closest from the top to down, if you want the closest from bottom up then you need to set it to 1.

The name cumSum, represents the cumulative sum of Count of Site Name as follow (check SCAN documentation for more information):

SCAN(0, B2:B10, LAMBDA(a,b,a b))

so there is no need to have a helper column for this intermediate calculation under Office 365.

To select the first x-rows (topIdx) is achieved via the following formula:

A2:INDEX($A$2:$B$10, topIdx, 2)

LET function is used to avoid repetition of the same elements in the formula and also to identify intermediate results. You can test each one of them replacing the last calculation of the formula with the specific name you would like to verify the output. For example replacing: A2:INDEX($A$2:$B$10, topIdx, 2) with topIdx, will return this value.

  • Related