Home > Software engineering >  Insert values for rows based on list
Insert values for rows based on list


I have a dataframe and a list

df = pd.DataFrame(
            'ID': ['AB01', 'AB02', 'AB02', 'AB03', 'AB03','AB03', 'AB04'],
            'col.A': ["Yes",np.nan,'Yes','Yes',"Yes",np.nan, np.nan]

ids = ['AB01', 'AB02', 'AB03']
new_col = 'result'
met = 'yes'
else_met = 'no'

for every id in list, if they're in the dataframe, I want to set the new_col to value in met and else_met if they're not available.

I tried the following code and it's not working, what am I doing wrong?
df[new_col] = df['ID'].apply(lambda x: met if x in ids else else_met)

CodePudding user response:

You can use loc and isin to assign met, and fillna the result with else_met:

df.loc[df.ID.isin(ids),'new_col'] = 'met'

     ID col.A   new_col
0  AB01   Yes       met
1  AB02   NaN       met
2  AB02   Yes       met
3  AB03   Yes       met
4  AB03   Yes       met
5  AB03   NaN       met
6  AB04   NaN  else_met

CodePudding user response:

df['new_col'] = 'else_met'
df.loc[ df.ID.isin(ids), 'new_col'] = 'met'

CodePudding user response:

You can use numpy.where to get the values based on Boolean masking, then assign it back to the new column

df[new_col]=np.where(df['ID'].isin(ids), met, else_met)


     ID col.A result
0  AB01   Yes    yes
1  AB02   NaN    yes
2  AB02   Yes    yes
3  AB03   Yes    yes
4  AB03   Yes    yes
5  AB03   NaN    yes
6  AB04   NaN     no

Since you are struggling with the method mentioned above which can possibly due to different version of the libraries, you can try following alternate solution (however, it is not efficient)

df[new_col]=df['ID'].apply(lambda x: met if x in ids else else_met)

Another numpy approach combined with List-Comprehension:

values = [met if x else else_met for x in np.isin(df['ID'].values, ids)]
# ['yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no']
df[new_col] = values

CodePudding user response:

you can check for every id in df if is in ids

import pandas as pd
df = pd.DataFrame(
            'ID': ['AB01', 'AB02', 'AB02', 'AB03', 'AB03','AB03', 'AB04'],
            'col.A': ["Yes",np.nan,'Yes','Yes',"Yes",np.nan, np.nan]

ids = ['AB01', 'AB02', 'AB03']

new_col = 'result'

df[new_col] = ['yes' if val in ids else 'no' for val in df['ID'] ]

ID col.A result
0  AB01   Yes    yes
1  AB02   NaN    yes
2  AB02   Yes    yes
3  AB03   Yes    yes
4  AB03   Yes    yes
5  AB03   NaN    yes
6  AB04   NaN     no
  • Related