I am trying to implement a function which checks whether a counter contains "similar" percentage of each items. That is
from collections import Counter
c = Counter(["Dog", "Cat", "Dog", "Horse", "Dog"])
size = 5
lst = list(c.values())
percentages = [x / size * 100 for x in lst] # [60.0, 20.0, 20.0]
How can I check whether those percentages
are all "similar"? I would like to apply the math.isclose
method with abs_tol=2
but it takes two arguments not the entire list.
In the example, items do not occurs similarly.
This method will be used for checking whether a training set of labels is balanced or not.
CodePudding user response:
One way is to pick the minimum and maximum value of the percentages list and pass those to isclose()
from math import isclose
from collections import Counter
def is_balanced(lst, abs_tol):
c = Counter(lst)
total = c.total()
percentages = [(v / total) * 100 for v in c.values()]
return isclose(min(percentages), max(percentages), abs_tol=abs_tol)
lst1 = ["Dog", "Cat", "Dog", "Horse", "Dog"]
lst2 = ["Dog", "Cat", "Horse"]
print(is_balanced(lst1, 2)) # False
print(is_balanced(lst2, 2)) # True
CodePudding user response:
Using np.isclose()
:
from collections import Counter
import numpy as np
def is_balanced(lst) -> bool:
c = Counter(lst)
fractions = np.asarray(list(c.values())) / len(lst)
return np.isclose(fractions, 1 / len(c)).all()
See the doc of np.isclose()
for arguments like atol
, rtol
, etc.