Is there better way to compare a main dataframe with controle dataframe for rows (30000)?-CodePudding

This question is auxiliary to question asked in this post Is there better way to iterate over nested loop for rows (30000)?. I created a for loop if an email address from my main dataframe (df) appears in my control dataframe (df_controle) and add a column value with 'ja' in main dataframe (df)

import pandas as pd
data={'Name':['Danny','Damny','Monny','Quony','Dimny','Danny'],
      'Email':['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']}
df=pd.DataFrame(data)
data1={'Name':['Danny','Monny','Quony'],
      'Email':['[email protected]','[email protected]','[email protected]']}
df_controle=pd.DataFrame(data1)
df['email_found_in_control_list']=None
col_email=df_controle.columns.get_loc("Email")
row_count=len(df.index)
for i in range(0,row_count):
    emailadres=df['Email'][i]
    for k in range(0, col_email):
        if emailadres==df_controle.iloc[k,col_email]:
            df['email_found_in_control_list'][i] = 'ja'
df.head()

CodePudding user response：

If you combine these two dataframes and find the ones that don't match, wouldn't it work ? Unmatched emails will be returned as nan.

import numpy as np
df_controle=df_controle.rename(columns={'Email':'control_email'})

final=df.merge(df_controle,how='left',left_on='Email',right_on='control_email')
final['email_found_in_control_list']=np.where(final['control_email']==np.nan,None,'ja') #if matched fill "ja" else None
final=final[['Name_x','Email','email_found_in_control_list']].rename(columns={'Name_x':'Name'})

CodePudding user response：

I found answer in this post For loop to replace value in one dataframe with X if it appears in another dataframe

The solution is here:

import pandas as pd
import numpy as np
data={'Name':['Danny','Damny','Monny','Quony','Dimny','Danny'],
      'Email':['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']}
df=pd.DataFrame(data)
data1={'Name':['Danny','Monny','Quony'],
      'Email':['[email protected]','[email protected]','[email protected]']}
df_controle=pd.DataFrame(data1)
df["email_found_in_control_list"] = list(map(lambda x: "ja" if x else "None", np.in1d(df.Email, df_controle.Email)))
df.head()