This question is auxiliary to question asked in this post Is there better way to iterate over nested loop for rows (30000)?. I created a for loop if an email address from my main dataframe (df) appears in my control dataframe (df_controle) and add a column value with 'ja' in main dataframe (df)
import pandas as pd
data={'Name':['Danny','Damny','Monny','Quony','Dimny','Danny'],
'Email':['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']}
df=pd.DataFrame(data)
data1={'Name':['Danny','Monny','Quony'],
'Email':['[email protected]','[email protected]','[email protected]']}
df_controle=pd.DataFrame(data1)
df['email_found_in_control_list']=None
col_email=df_controle.columns.get_loc("Email")
row_count=len(df.index)
for i in range(0,row_count):
emailadres=df['Email'][i]
for k in range(0, col_email):
if emailadres==df_controle.iloc[k,col_email]:
df['email_found_in_control_list'][i] = 'ja'
df.head()
CodePudding user response:
If you combine these two dataframes and find the ones that don't match, wouldn't it work ? Unmatched emails will be returned as nan.
import numpy as np
df_controle=df_controle.rename(columns={'Email':'control_email'})
final=df.merge(df_controle,how='left',left_on='Email',right_on='control_email')
final['email_found_in_control_list']=np.where(final['control_email']==np.nan,None,'ja') #if matched fill "ja" else None
final=final[['Name_x','Email','email_found_in_control_list']].rename(columns={'Name_x':'Name'})
CodePudding user response:
I found answer in this post For loop to replace value in one dataframe with X if it appears in another dataframe
The solution is here:
import pandas as pd
import numpy as np
data={'Name':['Danny','Damny','Monny','Quony','Dimny','Danny'],
'Email':['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']}
df=pd.DataFrame(data)
data1={'Name':['Danny','Monny','Quony'],
'Email':['[email protected]','[email protected]','[email protected]']}
df_controle=pd.DataFrame(data1)
df["email_found_in_control_list"] = list(map(lambda x: "ja" if x else "None", np.in1d(df.Email, df_controle.Email)))
df.head()