Home > Software engineering >  How to use apply() to add new columns with condition of standard deviation?
How to use apply() to add new columns with condition of standard deviation?

Time:05-05

I have a function, and a rule to be applied on df.

def apply_rule(df, rule):
    df['legal'] = df.apply(rule)

def greater_than_mean_plus_1_std():
    return df['col1']>df['col1'].mean() df['col1'].std()

apply_rule(df, greater_than_mean_plus_1_std)

I want to apply rule on my df that can make me a new column that tells me if the value of row is greater than mean std or not.

But with df.apply(), I can't use df.mean() and df.std() here.

AttributeError: 'float' object has no attribute 'mean'

Is there a way to do so? Or I have to use methods other than df.apply()?

edited:

print(df.head())

   col1
0   7.2
1   7.2
2   7.2
3   7.2
4   7.2

expected output:

   col1  legal
0   7.2  False
1   7.2  False
2   7.2  False
3   7.2  False
4   7.2  False

CodePudding user response:

No need to use apply here

df['legal'] = df['col1'] > (df['col1'].mean() df['col1'].std())

If you want to use apply, you can use DataFrame.apply on rows or Series.apply

df['legal'] = df.apply(lambda row: row['col1'] > (df['col1'].mean() df['col1'].std()), axis=1)
# or
df['legal'] = df['col1'].apply(lambda x: x > (df['col1'].mean() df['col1'].std()))

CodePudding user response:

You can use:

def apply_rule(df, rule):
    df['legal'] = rule(df)  # <- change here

def greater_than_mean_plus_1_std(df):  # <- change here
    return df['col1'] > df['col1'].mean()   df['col1'].std()

apply_rule(df, greater_than_mean_plus_1_std)

Output:

# df = pd.DataFrame({'col1': range(10)})
>>> df
   col1  legal
0     0  False
1     1  False
2     2  False
3     3  False
4     4  False
5     5  False
6     6  False
7     7  False
8     8   True
9     9   True

CodePudding user response:

calculating mean and std value first,

col1_mean = df["col1"].mean()
col1_std = df["col1"].std()

Then use these pre-calculated value in apply like this

df["legal"] = df["col1"].apply(lamdba x: x > col1_mean   col1_std)

and If you want to make it functional you can use lambda:

col1_mean = df["col1"].mean()
col1_std = df["col1"].std()
greater_than_mean_plus_1_std = lambda x: x > col1_mean   col1_std

def apply_rule(df, rule, column):
    df['legal'] = df[column].apply(rule)

Now to call this apply_rule

apply_rule(df, greater_than_mean_plus_1_std, "col1")
  • Related