This should be a simple one.
I want to create a new column that is populated with Y and N, based on certain conditions:
If a value from id
in df1
exists in id
in df2
, return Y, else N.
If a value in id
in df1
is NaN, return N. (id column is numeric)
What I have is:
df1['new_col']=df1['id'].isin(df2['id']).replace({False: 'N', True:'Y'})
which is fine, except that I also get a Y for empty values, but I want a N for NaN values.
So I've tried these but it's not working:
df1['new_col']=df1['new_col'].apply(lambda x: 'N' if pd.isnull(x)==True else x['new_col'])
or
def ss_am(x):
if x['id'].isin(df2['id']):
return 'Y'
if x['id']==pd.isnull(x):
return 'N'
else:
return 'N'
df['new_col']=df.apply(ss_am, 1)
exapmle:
CodePudding user response:
Assuming the 'id' column contains strings (since I don't see NaN in the output) and the value is always > 0,
df1['new_col'] = df1['id'].replace('', '-1').isin(df2['id']).replace({False: 'N', True:'Y'})
can work. That temporarily substitutes '-1' for the blanks, which df2 is very unlikely to have.
If the blanks are NaNs instead (and id are integers/floats), you can use fillna()
with a suitable value:
df1['new_col'] = df1['id'].fillna(-1).isin(df2['id']).replace({False: 'N', True:'Y'})
CodePudding user response:
This can easily be done by merging 2 df:
# Create expected value in df2
df2['new_col'] = 'y'
# Merge 2 dfs
output_df = pd.merge(df1, df2, how='outer', on=["id"] )
# Fill missing values
output_df['new_col'].fillna('n', inplace=True)