Home > Net >  Create a column under if condition doesn't work
Create a column under if condition doesn't work

Time:12-07

I have a data frame that contains some daily,monthly and weekly statistics and lost weight. enter image description here

I would like to create the boolean column that contains the information whether the lost weight was bigger or lower than the threshold. I tried using if loop nad np.where

if df_prod_stats.loc[df_prod_stats['frequency'] == "daily"]:
    df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 0.5)),1,0)

elif df_prod_stats.loc[df_prod_stats['frequency'] == "monthly"]:
    df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 15)),1,0)

else:
    df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 3.5)),1,0)

But i get an error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response:

I think you will need to do this a different way. I think you're trying to go through each row to see if it's weekly/monthly and checking the loss weight accordingly, however that is not what your code actually does. In the if df_prod_stats.loc[...], the loc will return a subset of the data frame, which will evaluate to true if it has data in, but then your next line of trying to fill in the new column will just apply to the entire original data frame, not the rows that just matched the loc statement. You can achieve what (I think) you want using several loc statements as below:

create target_met column and set to 0:

df_prod_stats['target_met'] = 0

Then use .loc to filter your first if statement condition (frequency is daily, loss weight is less than 0.5), and set target met to be 1:

df_prod_stats.loc[(df_prod_stats['frequency'] == 'daily')
                   & (df_prod_stats['loss_weight'] < 0.5), 'target_met'] = 1

elif condition (frequency is monthly, loss weight is less than 15):

df_prod_stats.loc[(df_prod_stats['frequency'] == 'monthly')
                  & (df_prod_stats['loss_weight'] < 15), 'target_met'] = 1

else condition (frequency is neither daily or monthly, and loss weight is less than 3.5):

df_prod_stats.loc[~(df_prod_stats['frequency'].isin(['daily', 'monthly']))
                  & (df_prod_stats['loss_weight'] < 3.5), 'target_met'] = 1

Put together you get:

df_prod_stats['target_met'] = 0
df_prod_stats.loc[(df_prod_stats['frequency'] == 'daily')
                  & (df_prod_stats['loss_weight'] < 0.5), 'target_met'] = 1
df_prod_stats.loc[(df_prod_stats['frequency'] == 'monthly')
                  & (df_prod_stats['loss_weight'] < 15), 'target_met'] = 1
df_prod_stats.loc[~(df_prod_stats['frequency'].isin(['daily', 'monthly']))
                  & (df_prod_stats['loss_weight'] < 3.5), 'target_met'] = 1

Output:

  frequency  loss_weight  target_met
0     daily        -0.42           1
1     daily        -0.35           1
2     daily        -0.67           1
3     daily        -0.11           1
4     daily        -0.31           1

I hope that is what you're trying to achieve.

CodePudding user response:

I found out it's possible also to use simple set of conditions in np.whereas follows:

df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 0.5) & ( df_prod_stats['frequency'] == "daily")
                                      | (df_prod_stats['loss_weight'] < 15.0) & ( df_prod_stats['frequency'] == "monthly")
                                      | (df_prod_stats['loss_weight'] < 3.5) & ( df_prod_stats['frequency'] == "weekly")),1,0)
  • Related