Home > OS >  Pandas: Add calculated column based on condition
Pandas: Add calculated column based on condition

Time:09-26

I would like to calculate a column based on the values of mean and stdev columns that were calculated in the previous step. I am unable to use the lambda function correctly.

#Import necessary modules
import pandas as pd

data = {
        'A':[1, 2, 3],
        'B':[4, 5, 6],
        'C':[7, 8, 9] }
     
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)

data_mean = df.mean(axis=1)
data_stdev = df.std(axis=1)

#Calculate LV column for data df
df['LV'] = df.apply(
    lambda row : 0
    if data_mean < 55.5:
        LV = (55.5-data_mean) (3.1*data_stdev)
    elif data_mean > 57.5:
        LV = (data_mean-57.5) (3.1*data_stdev)
    else:
        LV = (3.1*data_stdev), 
    axis = 1)

display(df)

CodePudding user response:

Another approach you could try - is a similar speed to the other answer (if not slightly faster):


#Import necessary modules
import pandas as pd

def calculate_lv(x):
    if x['MEAN'] < 55.5:
        return (55.5 - x['MEAN'])   (3.1 * x['STDEV'])
    elif x['MEAN'] > 57.5:
        return (x['MEAN'] - 57.5)   (3.1 * x['STDEV'])
    else:
        return x['STDEV'] * 3.1

data = {
        'A':[1, 2, 3],
        'B':[4, 5, 6],
        'C':[7, 8, 9] }
     
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)

df['MEAN'] = df.mean(axis=1)
df['STDEV'] = df.std(axis=1)


df['LV'] = df.apply(lambda x: calculate_lv(x), axis=1)

CodePudding user response:

I would suggest using a vectorized approach, as it would work faster:


#Import necessary modules
import pandas as pd

data = {
        'A':[1, 2, 3],
        'B':[4, 5, 6],
        'C':[7, 8, 9] }
     
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)

data_mean = df.mean(axis=1)
data_stdev = df.std(axis=1)

#Calculate LV column for data df
# base value
df['LV'] = 3.1 * data_stdev
# different values
df.loc[data_mean < 55.5, 'LV'] = (55.5 - data_mean)   (3.1 * data_stdev)
df.loc[data_mean > 57.5, 'LV'] = (data_mean - 57.5)   (3.1 * data_stdev)

display(df)

  • Related