I have a pandas dataframe and would like to create a new column based on the below condition:
def confidence_level(row):
if (row['ctry_one'] == row['ctry_two']) and (row['Market'] == 'yes'):
return 'H'
if (row['ctry_one'] == row['ctry_two']) and (row['Market'] == 'no'):
return 'M'
if (row['ctry_one'] != row['ctry_two']) and (row['Market'] == 'yes'):
return 'M'
if (row['ctry_one'] != row['ctry_two']) and (row['Market'] == 'no'):
return 'L'
df['status'] = confidence_level(df)
This is the error I receive:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
func_test['Confidence'].value_counts()
Has anyone experienced this before? I tried applying .all() at the end of each argument like below, but this just returns 'None' for everything:
def confidence_level(row):
if (row['ctry_one'] == row['ctry_two']).all() and (row['Market'] == 'yes').all():
return 'H'
if (row['ctry_one'] == row['ctry_two']).all() and (row['Market'] == 'no').all():
return 'M'
if (row['ctry_one'] != row['ctry_two']).all() and (row['Market'] == 'yes').all():
return 'M'
if (row['ctry_one'] != row['ctry_two']).all() and (row['Market'] == 'no').all():
return 'L'
CodePudding user response:
You need to call your function for each row, rather than for the whole dataframe, like this:
df['status'] = df.apply(confidence_level, axis=1)
That said, using np.select
like Mayank's solution or using .loc
like this will be run faster:
def confidence_level(df):
new_df = df.copy()
new_df.loc[(df['ctry_one'] == df['ctry_two']) & (df['Market'] == 'yes'), 'status'] = 'H'
new_df.loc[(df['ctry_one'] == df['ctry_two']) & (df['Market'] == 'no'), 'status'] = 'M'
new_df.loc[(df['ctry_one'] != df['ctry_two']) & (df['Market'] == 'yes'), 'status'] = 'M'
new_df.loc[(df['ctry_one'] != df['ctry_two']) & (df['Market'] == 'no'), 'status'] = 'L'
return df
df = confidence_level(df)
CodePudding user response:
Use numpy.select
instead, which is more performant and readable:
import numpy as np
conditions = [(df['ctry_one'] == df['ctry_two']) & (df['Market'] == 'yes'), (df['ctry_one'] == df['ctry_two']) & (df['Market'] == 'no'), (df['ctry_one'] != df['ctry_two']) & (df['Market'] == 'yes'), (df['ctry_one'] != df['ctry_two']) & (df['Market'] == 'no')]
choices = ['H', 'M', 'M', 'L']
df['status'] = np.select(conditions, choices)