I have one data frame, wherein I need to apply range in one column, based on the list provided, I am able to achieve results using fixed values but input values will be dynamic in a list format and the range will be based on input.
MY Data frame looks like below:
import pandas as pd
rangelist=[90,70,50]
data = {'Result': [75,85,95,45,76,8,10,44,22,65,35,67]}
sampledf=pd.DataFrame(data)
range list is my list, from that I need to create range like 100-90,90-70 & 70-50. These ranges may differ from time to time, till now I am achieving results using the below function.
def cat(value):
cat=''
if (value>90):
cat='90-100'
if (value<90 and value>70 ):
cat='90-70'
else:
cat='< 50'
return cat
sampledf['category']=sampledf['Result'].apply(cat)
How can I pass dynamic value in function"cat" based on the range list? I will be grateful if someone can help me to achieve the below result.
Result category
0 75 90-70
1 85 90-70
2 95 < 50
3 45 < 50
4 76 90-70
5 8 < 50
6 10 < 50
7 44 < 50
8 22 < 50
9 65 < 50
10 35 < 50
11 67 < 50
CodePudding user response:
I would recommend pd.cut
for this:
sampledf['Category'] = pd.cut(sampledf['Result'],
[-np.inf] sorted(rangelist) [np.inf])
Output:
Result Category
0 75 (70.0, 90.0]
1 85 (70.0, 90.0]
2 95 (90.0, inf]
3 45 (-inf, 50.0]
4 76 (70.0, 90.0]
5 8 (-inf, 50.0]
6 10 (-inf, 50.0]
7 44 (-inf, 50.0]
8 22 (-inf, 50.0]
9 65 (50.0, 70.0]
10 35 (-inf, 50.0]
11 67 (50.0, 70.0]
CodePudding user response:
import numpy as np
breaks = pd.Series([100, 90, 75, 50, 45, 20, 0])
sampledf["ind"] = sampledf.Result.apply(lambda x: np.where(x >= breaks)[0][0])
sampledf["category"] = sampledf.ind.apply(lambda i: (breaks[i], breaks[i-1]))
sampledf
# Result ind category
# 0 75 2 (75, 90)
# 1 85 2 (75, 90)
# 2 95 1 (90, 100)
# 3 45 4 (45, 50)
# 4 76 2 (75, 90)
# 5 8 6 (0, 20)
# 6 10 6 (0, 20)
# 7 44 5 (20, 45)
# 8 22 5 (20, 45)
# 9 65 3 (50, 75)
# 10 35 5 (20, 45)
# 11 67 3 (50, 75)