Run Python function over two DataFrame columns-CodePudding

I am stuck with an issue, and I think it should be straightforward. The problem is that I have a function, that I would like to apply to two columns of my dataframe. But I receive an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

To show you what I am trying to do:

# Calculate the accuracy 
def mape(actual,pred):
  if actual == 0:
    if pred == 0:
      return 0
    else:
      return 100
  else:
    return np.mean(np.abs((actual - pred) / actual)) * 100

Then, I try to apply it on two columns (called Actuals_March & Forecast_March).

# This line runs into the ValueError above. 
# I removed all NaN values before running this. 
df['MAPE_Mar'] = df.apply(lambda x: mape(df.Actuals_March , df.Forecast_March), axis=1)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

#This is an snapshot of my data: 
df.Actuals_March       df.Forecast_March
          0.0     0.0
          0.0     0.0
          0.0     0.0
          4.0     0.0
          0.0     0.0
          5.0     0.0
         20.0     0.0
          0.0     0.0
          2.0     0.0
         13.0     0.0

Hope you can help me. Thanks in advance

CodePudding user response：

Repalce df to x for match values of scalars by columns:

df['MAPE_Mar'] = df.apply(lambda x: mape(x.Actuals_March , x.Forecast_March), axis=1)

Vectorized alternative:

m1 = df['Actuals_March'] == 0
m2 = df['Forecast_March'] == 0
s = (np.abs(df['Actuals_March'] - df['Forecast_March']) / df['Actuals_March']) * 100

df['MAPE_Mar1'] = np.select([m1 & m2, ~m1 & m2], [0, 100], s)