Here is an example of the data I'm trying to organize:
I'm looking for a way to automatically see the top 3 categories (column) for each Name# (row). The size of the category is determined by the number below the category.
Ideally, I'd also like to see a percentage breakdown (from the total) for each category. For example, in row "Name3" 2 categories make up a significantly larger portion of the total values. However, without this percentage breakdown, the 3 top values would seem to be comparable, when they are in fact, not.
Interested to see how this would all work with duplicate numbers, too.
I've tried Excel's rank function, but this doesn't tell me the categories that have the 3 largest sizes, just the 3 highest values.
CodePudding user response:
With Office 365:
=FILTER(SORTBY($B$1:$H$1,B2:H2,-1),SORT(B2:H2,1,-1,TRUE)>=LARGE(B2:H2,3))
And copy down.
If there are ties it will expand the results to include it. It finds the third highest value and returns everything that is equal to or greater than it.
CodePudding user response:
This approach spills all the results at once (array version). In cell J2
, you can put the following formula:
=LET(D, A1:H5, A, TAKE(D,,1), DROP(REDUCE("", DROP(A,1), LAMBDA(ac,aa,
VSTACK(ac, TAKE(SORT(DROP(FILTER(D, (A=aa) (A="")),,1),2,-1,1),1,3)))),1))
It assumes as per input data the cell A1
is empty (if not it can be adjusted accordingly). Here is the output:
An alternative that doesn't require previous assumption (but it is not really a hard one) is the following:
=LET(names, A2:A5, Data, B2:H5, colors, B1:H1, DROP(REDUCE("", names,
LAMBDA(ac,n, VSTACK(ac, TAKE(SORT(VSTACK(colors, INDEX(Data, XMATCH(n,names),0))
,2,-1,TRUE),1,3)))),1))
The non-array version can be obtained from previous approach, and expand it down:
=TAKE(SORT(VSTACK($B$1:$H$1,INDEX($B$2:$H$5, XMATCH(A2,$A$2:$A$5),0)),2,-1,TRUE),1,3)
Explanation
To spill the entire solution it uses DROP/REDUCE/VSTACK
pattern. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length.
For the first formula we filter for a given element of A
name (aa
) via FILTER
the input data (D
) to select rows where the name is empty (to consider the header) OR (plus (
) condition) the name is equal to aa
. We remove via DROP
the first column of the filter result (names column). Next we SORT
by the second row (the first rows are the colors) in descending order (-1
) by column (last input parameter of SORT
we can use TRUE
or 1
). Finally, we use TAKE
to take the first three columns and the first row.
For the second approach, we select the values for a given row (names
equals n
) and use INDEX
to select the entire row (column index 0
), then we form an array via VSTACK
to add as first row the colors
and use the similar logic as in previous approach for sorting and select the corresponding rows and column (colors
).
Notes:
- If you don't have
VSTACK
function available, then you can replace it as follow:CHOOSE({1;2}, arr1,arr2)
and substitutearr1
,arr2
, wit the corresponding arrays. - In the second formula instead of
INDEX/XMATCH
you can use:DROP(FILTER(Data, names=n),,1)
, it is a matter of personal preference.