I want to create and populate a column ['EX_Excluded']
in a dataframe df_master_ls_and_ips
depending if an IP is found in a column in another dataframe column df_ip_excluded['EX_net_ip']
.
row holds the ip address being iterated, if that IP exist in excluded IP, then cell value is "Excluded" else "Not Excluded"
below is the last "working" version of the line:
df_master_ls_and_ips['EX_Excluded'] = df_master_ls_and_ips['LS_ip'].apply (lambda row:
"Excluded" if df_ip_excluded['EX_net_ip'].any() == row else "Not Excluded")
I tried different ways to iterate through the df_ip_excluded['EX_net_ip']
column and compare it with row variable from lambda, but none of them works.
All cell in the new column df_master_ls_and_ips['EX_Excluded']
are being filled with "Excluded", and it takes a long time to complete, I guess an iteration over an iteration is not good, but I am a bit lost.
Any help appreciated.
CodePudding user response:
IIUC, your test should be checking for membership:
df_master_ls_and_ips['EX_Excluded'] = df_master_ls_and_ips['LS_ip'].apply(lambda row: "Excluded" if row in df_ip_excluded['EX_net_ip']
else "Not Excluded")
You could also convert df_ip_excluded['EX_net_ip']
to a set and use list comprehension (should be faster than calling the lambda):
excluded = set(df_ip_excluded['EX_net_ip'])
df_master_ls_and_ips['EX_Excluded'] = ["Excluded" if row in excluded else "Not Excluded" for row in df_master_ls_and_ips['LS_ip']]
We could also use numpy broadcasting as well:
df_master_ls_and_ips['EX_Excluded'] = (df_master_ls_and_ips['LS_ip'].to_numpy()[:, None]==df_ip_excluded['EX_net_ip'].to_numpy()).any(axis=1)