I have a function, and a rule to be applied on df.
def apply_rule(df, rule):
df['legal'] = df.apply(rule)
def greater_than_mean_plus_1_std():
return df['col1']>df['col1'].mean() df['col1'].std()
apply_rule(df, greater_than_mean_plus_1_std)
I want to apply rule on my df that can make me a new column that tells me if the value of row is greater than mean std or not.
But with df.apply(), I can't use df.mean() and df.std() here.
AttributeError: 'float' object has no attribute 'mean'
Is there a way to do so? Or I have to use methods other than df.apply()?
edited:
print(df.head())
col1
0 7.2
1 7.2
2 7.2
3 7.2
4 7.2
expected output:
col1 legal
0 7.2 False
1 7.2 False
2 7.2 False
3 7.2 False
4 7.2 False
CodePudding user response:
No need to use apply
here
df['legal'] = df['col1'] > (df['col1'].mean() df['col1'].std())
If you want to use apply
, you can use DataFrame.apply on rows or Series.apply
df['legal'] = df.apply(lambda row: row['col1'] > (df['col1'].mean() df['col1'].std()), axis=1)
# or
df['legal'] = df['col1'].apply(lambda x: x > (df['col1'].mean() df['col1'].std()))
CodePudding user response:
You can use:
def apply_rule(df, rule):
df['legal'] = rule(df) # <- change here
def greater_than_mean_plus_1_std(df): # <- change here
return df['col1'] > df['col1'].mean() df['col1'].std()
apply_rule(df, greater_than_mean_plus_1_std)
Output:
# df = pd.DataFrame({'col1': range(10)})
>>> df
col1 legal
0 0 False
1 1 False
2 2 False
3 3 False
4 4 False
5 5 False
6 6 False
7 7 False
8 8 True
9 9 True
CodePudding user response:
calculating mean and std value first,
col1_mean = df["col1"].mean()
col1_std = df["col1"].std()
Then use these pre-calculated value in apply like this
df["legal"] = df["col1"].apply(lamdba x: x > col1_mean col1_std)
and If you want to make it functional you can use lambda:
col1_mean = df["col1"].mean()
col1_std = df["col1"].std()
greater_than_mean_plus_1_std = lambda x: x > col1_mean col1_std
def apply_rule(df, rule, column):
df['legal'] = df[column].apply(rule)
Now to call this apply_rule
apply_rule(df, greater_than_mean_plus_1_std, "col1")