I want to assign each row to a category based on the value in a specific column of my dataframe. Here is the function:
def assign_SOP (df):
if df['Strikeouts per Pitches'] <= 0.053571:
return 'Below average'
elif df['Strikeouts per Pitches'] >= 0.053571 and df['Strikes per Pitches'] < 0.059794:
return 'Average'
elif df['Strikeouts per Pitches'] >= 0.059794 and df['Strikes per Pitches'] < 0.068870:
return 'Above Average'
elif df['Strikeouts per Pitches'] >= 0.068870:
return 'Elite'
#Creating Columsn for each category
df_MLB['SP Category'] = df_MLB.apply(assign_SP, axis=1)
df_MLB['SOP Category'] = df_MLB.apply(assign_SOP, axis=1)
Somehow it only works for 'Below average' and 'Elite'
I used almost the same function for another column and it worked:
def assign_SP (df):
if df['Strikes per Pitches'] <= 0.645129:
return 'Below average'
elif df['Strikes per Pitches'] >= 0.645129 and df['Strikes per Pitches'] < 0.656995:
return 'Average'
elif df['Strikes per Pitches'] >= 0.656995 and df['Strikes per Pitches'] < 0.672696:
return 'Above Average'
elif df['Strikes per Pitches'] >= 0.672696:
return 'Elite'
Can someone help me out here?
CodePudding user response:
I would use pandas.cut
to save time, energy and memory :
import numpy as np
categories = ['Below average', 'Average', 'Above Average', 'Elite']
values = [0, 0.053571, 0.059794, 0.068870, np.inf]
df["SOP Category"] = pd.cut(df["Strikeouts per Pitches"], bins=values, labels=categories, include_lowest=True)
# Output :
print(df)
Strikeouts per Pitches SOP Category
0 0.064281 Above Average
1 0.054225 Average
2 0.064516 Above Average
3 0.063732 Above Average
4 0.060326 Above Average
5 0.056730 Average
6 0.078766 Elite
7 0.068870 Above Average
8 0.058195 Average
9 0.052836 Below average
10 0.050294 Below average
11 0.057866 Average
12 0.074221 Elite
13 0.059794 Average
14 0.052574 Below average
15 0.045643 Below average
16 0.048541 Below average
17 0.065417 Above Average
18 0.064903 Above Average
19 0.077328 Elite
NB : You have to make a cut for each column separatly.