Say I have a dataframe (df_1) with "high" column limits defined in terms of percentiles like so:
df_1:
A B C
72 57 61
3 52 32
64 51 93
71 79 91
25 23 31
91 90 1
43 44 84
98 91 8
11 49 38
65 33 59
# High Percentiles defined for each column as a variable
A_High = np.percentile(df_1['A'], 60)
B_High = np.percentile(df_1['B'], 60)
C_High = np.percentile(df_1['C'], 60)
#The output for the above variables:
# A_High = 67.39999999999999
# B_High = 54.0
# C_High =59.8
I am trying to create a function that uses the percentile variables above and an IF statement with multiple conditions to filter results of the dataframe. The filtered results are then compiled as 1 or 0 in a separate column called Alerts. I've been able to perform this successfully using the following code:
def Alert(df):
if df['A'] >= 67.39999999999999 and df['B'] >= 54.0 and df['C'] >= 59.8:
return 1
else:
return 0
df_1.insert(3, 'Alert', df_1.apply(Alert, axis=1))
OUTPUT:
A B C Alert
72 57 61 1
3 52 32 0
64 51 93 0
71 79 91 1
25 23 31 0
91 90 1 0
43 44 84 0
98 91 8 0
11 49 38 0
65 33 59 0
But when I define the percentiles within the function itself then it does not generate the desired output.
def Alert(df):
A_High = np.percentile(df['A'], 60)
B_High = np.percentile(df['B'], 60)
C_High = np.percentile(df['C'], 60)
if df['A'] >= A_High and df['B'] >= B_High and df['C'] >= C_High:
return 1
else:
return 0
df_1.insert(3, 'Alert', df_1.apply(Alert, axis=1))
OUTPUT:
A B C Alert
72 57 61 1
3 52 32 1
64 51 93 1
71 79 91 1
25 23 31 1
91 90 1 1
43 44 84 1
98 91 8 1
11 49 38 1
65 33 59 1
I want to run this function over different dataframes and don't want to manually enter the high limit values every time. Would appreciate if someone could help me out here (FYI: New to Python)
CodePudding user response:
Pandas apply function iterates over the rows of dataframe so the calculated values for A_High B_High and C_High will change for each iteration you can create different function which returns those values and pass the values to the Alert function
Example:
def percentiles(df):
A_High = np.percentile(df['A'], 60)
B_High = np.percentile(df['B'], 60)
C_High = np.percentile(df['C'], 60)
return A_High,B_High,C_High
def Alert(df,A_High,B_High,C_High):
if df['A'] >= A_High and df['B'] >= B_High and df['C'] >= C_High:
return 1
else:
return 0
A_High,B_High,C_High=percentiles(df_1)
df_1.insert(3, 'Alert', df_1.apply(Alert, axis=1,args=A_High,B_High,C_High))