I have a dataframe and a list
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB02', 'AB03', 'AB03','AB03', 'AB04'],
'col.A': ["Yes",np.nan,'Yes','Yes',"Yes",np.nan, np.nan]
}
)
ids = ['AB01', 'AB02', 'AB03']
new_col = 'result'
met = 'yes'
else_met = 'no'
for every id in list, if they're in the dataframe, I want to set the new_col
to value in met
and else_met
if they're not available.
I tried the following code and it's not working, what am I doing wrong?
df[new_col] = df['ID'].apply(lambda x: met if x in ids else else_met)
CodePudding user response:
You can use loc
and isin
to assign met, and fillna
the result with else_met:
df.loc[df.ID.isin(ids),'new_col'] = 'met'
df['new_col'].fillna('else_met',inplace=True)
ID col.A new_col
0 AB01 Yes met
1 AB02 NaN met
2 AB02 Yes met
3 AB03 Yes met
4 AB03 Yes met
5 AB03 NaN met
6 AB04 NaN else_met
CodePudding user response:
df['new_col'] = 'else_met'
df.loc[ df.ID.isin(ids), 'new_col'] = 'met'
CodePudding user response:
You can use numpy.where
to get the values based on Boolean masking, then assign it back to the new column
df[new_col]=np.where(df['ID'].isin(ids), met, else_met)
OUTPUT:
ID col.A result
0 AB01 Yes yes
1 AB02 NaN yes
2 AB02 Yes yes
3 AB03 Yes yes
4 AB03 Yes yes
5 AB03 NaN yes
6 AB04 NaN no
Since you are struggling with the method mentioned above which can possibly due to different version of the libraries, you can try following alternate solution (however, it is not efficient)
df[new_col]=df['ID'].apply(lambda x: met if x in ids else else_met)
Another numpy approach combined with List-Comprehension:
values = [met if x else else_met for x in np.isin(df['ID'].values, ids)]
# ['yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no']
df[new_col] = values
CodePudding user response:
you can check for every id
in df
if is in ids
import pandas as pd
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB02', 'AB03', 'AB03','AB03', 'AB04'],
'col.A': ["Yes",np.nan,'Yes','Yes',"Yes",np.nan, np.nan]
}
)
ids = ['AB01', 'AB02', 'AB03']
new_col = 'result'
df[new_col] = ['yes' if val in ids else 'no' for val in df['ID'] ]
#outputs
ID col.A result
0 AB01 Yes yes
1 AB02 NaN yes
2 AB02 Yes yes
3 AB03 Yes yes
4 AB03 Yes yes
5 AB03 NaN yes
6 AB04 NaN no