Home > OS >  How to use pandas dataframe column value for multiple if-else conditions to calculate other columns
How to use pandas dataframe column value for multiple if-else conditions to calculate other columns

Time:08-17

Am having dataframe ,need to implement multiple ifelse conditions to filter value column & multiply expo_value column with cal_value column

Input Dataframe

client_id      expo_value          value          cal_value
1                   126             30                 27.06
2                   135             60                 36.18
3                   144             120                 45
4                   162             30                 54.09
5                   153             90                 63.63
6                   181             120                 72.9
4                   207             30                  99.09
5                   315             90                 126.63
6                   414             120                 81.9

I have written function to filter multiple conditions and apply formula to calculate for that condition

def cal_df(df):
    
    if df[df['value'] <=30]:
        df= df['expo_value'] *30   df['cal_value'] * 45
        return df
    elif df[(df['value'] <=60 and (df['value'] >=100)]:
        df= df['expo_value'] *60   df['cal_value'] * 90
        return df
    elif df[(df['value'] <=100 and (df['value'] >=150)]:
        df= df['expo_value'] *100   df['cal_value'] * 120
        return df
    else df[df['value'] <=10]:
        return np.nan



data = data.groupby('client_id').apply(lambda x:cal_df(x)).reset_index()
results should be stored as new column


### if am applying groupby condition am getting following error

Valueerror: The truth value of a series is ambiguous. Use a.empty ,a.bool(),a.item(),a.any() or a.all()

if am applying groupby condition am getting following error

#Valueerror: The truth value of a series is ambiguous. Use a.empty ,a.bool(),a.item(),a.any() or a.all()

CodePudding user response:

If I understand what you want to do, you don't need to do a groupby. You can do this:

data.loc[data['value'] <= 30, "new_column"] = data['expo_value']*30   data['cal_value']*45

data.loc[(data['value'] <= 60) & (data['value'] >= 100), "new_column"] = data['expo_value']*60   data['cal_value']*90

data.loc[(data['value'] <= 100) & (data['value'] >= 150), "new_column"] = data['expo_value']*100   data['cal_value']*120

The values for "new_column" in any rows not captured by the above will be NaN by default. But there is another problem, which is those last two lines won't match anything, because something can't be both below 60 and above 100, or below 100 and above 150. Should it be <= for both of them?

CodePudding user response:

Try this

def func(value, expo_value, cal_value):
    if value <= 30: return expo_value * 30   cal_value * 45
    elif value <= 60 and value >= 100: return expo_value * 60   cal_value * 90
    elif value <= 100 and value >= 150:  return expo_value * 100   cal_value * 120
    else: return np.nan

df['new_values'] = df[['value', 'expo_value', 'cal_value']].apply(lambda x: func(*x.values.tolist()), axis = 1)
  • Related