I have a dataframe with several parameters:
par1 par2 par3 par4 par5
1.122208 1.054132 1.133250 1.114845 1.183850
1.076445 1.128663 0.998518 1.081816 1.006934
1.077058 1.561871 1.045255 1.120456 1.768667
0.904869 1.183985 0.938095 0.927841 1.201934
0.876596 1.044014 0.877457 0.871429 0.990452
...
The value of each parameter needs to be checked against a specific threshold. I need to check whether at least two of the above parameters are above the aforementioned thresholds. It does not matter which parameters are above the threshold, as long as there are at least two of them. Note that par1 has a threshold1, par2 a threshold2 and so on, with threshold1 different from threshold2,..., threshold5 and so on.
So far I have written an ugly nested if condition, but I was wondering what would be the best approach here.
CodePudding user response:
Does this help solve your problem?
df = pd.DataFrame(
{
'par1': [1.122208, 1.076445, 1.077058, 0.904869, 0.876596],
'par2': [1.054132, 1.128663, 1.561871, 1.183985, 1.044014],
'par3': [1.133250, 0.998518, 1.045255, 0.938095, 0.877457],
'par4': [1.114845, 1.081816, 1.120456, 0.927841, 0.871429],
'par5': [1.183850, 1.006934, 1.768667, 1.201934, 0.990452],
}
)
thresholds = {
'par1': 0.5,
'par2': 3,
'par3': 1.2,
'par4': 1.1,
'par5': 3,
}
def check_thresholds(input_row):
no_over_threshold = sum(
[value > thresholds[col_name] for col_name, value in input_row.items()]
)
if no_over_threshold >= 2:
return True
else:
return False
df['above_thresholds'] = df.apply(check_thresholds, axis=1)
Example output:
CodePudding user response:
Using Kelvin Ducray's sample data, we can take the solution a step further, to avoid the for-loop/apply, and use Pandas' vectorized operations, which should be faster:
thresholds = pd.Series(thresholds)
# compare df with thresholds
# sum accross the booleans
# check True or False for >=2
above_thresholds = df.gt(thresholds).sum(1).ge(2)
df.assign(above_thresholds = above_thresholds)
par1 par2 par3 par4 par5 above_thresholds
0 1.122208 1.054132 1.133250 1.114845 1.183850 True
1 1.076445 1.128663 0.998518 1.081816 1.006934 False
2 1.077058 1.561871 1.045255 1.120456 1.768667 True
3 0.904869 1.183985 0.938095 0.927841 1.201934 False
4 0.876596 1.044014 0.877457 0.871429 0.990452 False