How can I fix this code where I'm trying to assign the values in a column according to the various conditions. Writing the below code gives me an error saying:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
list = [df1,df2,df3,df4] # multiple dataframes
grp_list = ["con", "eco", "dip", "pol"] # multiple categories in a column
for i in list:
if i['pgp'].isin(group_list) and (i.egp == i.pgp):
i['value'] == 1
elif ~i['pgp'].isin(group_list):
i['value'] == 2
else:
i['value'] == 0
Expected Output df1
:
pgp egp value
con con 1 # return 1 if pgp value is in the element list & pgp = egp
eco eco 1
dip health 0 # else 0
pol health 0
god con 2
ent eco 2 # return 2 if pgp value is not in the element list
CodePudding user response:
Use np.select
~
def apply_conditions(df, group_list):
in_group_list = df['pgp'].isin(group_list)
conditions = (
in_group_list & df['egp'].eq(df['pgp']),
~in_group_list
)
choices = (
1,
2
)
df['value'] = np.select(conditions, choices, default=0)
return df
dfs = [df1,df2,df3,df4]
grp_list = ['con', 'eco', 'dip', 'pol']
dfs = [apply_conditions(df, grp_list) for df in dfs]
CodePudding user response:
You can use np.select
here:
import numpy as np
df1 = pd.DataFrame({
'pgp': ['con', 'eco', 'dip', 'pol', 'god', 'ent'],
'egp': ['con', 'eco', 'health', 'health', 'con', 'eco']
})
grp_list = ["con", "eco", "dip", "pol"]
m1 = df1['pgp'].isin(grp_list)
m2 = df1['pgp'] == df1['egp']
conditions=[m1&m2, ~m1]
choices = [1, 2]
df1['value'] = np.select(conditions, choices, default=0)
print(df1)
Output:
pgp egp value
0 con con 1
1 eco eco 1
2 dip health 0
3 pol health 0
4 god con 2
5 ent eco 2