I have the loan dataset below -
Sector | Total Units | Bad units | Bad Rate |
---|---|---|---|
Retail Trade | 16 | 5 | 31% |
Construction | 500 | 1100 | 20% |
Healthcare | 165 | 55 | 33% |
Mining | 3 | 2 | 67% |
Utilities | 56 | 19 | 34% |
Other | 300 | 44 | 15% |
How can I create a ranking function to sort this data based on the bad_rate while also accounting for the number of units ?
e.g This is the result when I sort in descending order based on bad_rate
Sector | Total Units | Bad units | Bad Rate |
---|---|---|---|
Mining | 3 | 2 | 67% |
Utilities | 56 | 19 | 34% |
Healthcare | 165 | 55 | 33% |
Retail Trade | 16 | 5 | 31% |
Construction | 500 | 1100 | 20% |
Other | 300 | 44 | 15% |
Here, Mining shows up first but I don't really care about this sector as it only has a total of 3 units. I would like construction, other and healthcare to show up on the top as they have more # of total as well as bad units
CodePudding user response:
STEP 1) is easy...
Use SORT("Range","ByColNumber","Order")
Just put it in the top left cell of where you want your sorted data.
=SORT(B3:E8,4,-1)
:
STEP 2)
Here's the tricky part... you need to decide how to weight the outage.
Here, I found multiplying the Rate% by the Total Unit Rank:
I think this approach gives pretty good results... you just need to play with the formula!
Please let me know what formula you eventually use!
CodePudding user response:
You would need to define sorting criteria, since you don't have a priority based on column, but instead a combination. I would suggest defining a function that weighs both columns: Total Units and Bad Rate. Using a weight function would be a good idea, but first, we would need to normalize both columns. For example put the data in a range 0-100
, so we can weigh each column having similar values. Once you have the data normalized then you can use criteria like this:
w_1 * x w_2 * y
This is the main idea. Now to put this information in Excel. We create an additional temporary variable with the previous calculation and name it crit
. We Define a user LAMBDA
function SORT_BY
for calculating crit
as follows:
LAMBDA(a,b, wu*a wbr*b)
and we use MAP
to calculate it with the normalized data. For convenience we define another user LAMBDA
function to normalize the data: NORM
as follows:
LAMBDA(x, 100*(x-MIN(x))/(MAX(x) - MIN(x)))
Here is the formula:
=LET(wu, 0.6, wbr, 0.8, u, B2:B7, br, D2:D7, SORT_BY, LAMBDA(a,b, wu*a wbr*b),
NORM, LAMBDA(x, 100*(x-MIN(x))/(MAX(x) - MIN(x))),
crit, MAP(NORM(u), NORM(br), LAMBDA(a,b, SORT_BY(a,b))),
DROP(SORT(HSTACK(A2:D7, crit),5,-1),,-1))
You can customize how to weight each column (via wu
for Total Units and wbr
for Bad Rates columns). Finally, we present the result removing the sorting criteria (crit
) via the DROP
function. If you want to show it, then remove this step.