Hi I have two dataframes. One is parent dataframe and second is incremental dataframe. I just want to extract those records which is present in incremental dataframe but not present in parent dataframe based on the key column.
Example:
Key Column : call_id
parent_dataframe:
call_id call_nm src
100 QC Darzalex MM
105 XY INVOKANA
107 CZ Simponi RA
117 NM Guselkumab PSA
118 YC STELARA
126 RF INVOKANA
Incremental Dataframe:
call_id call_nm src
118 YC STELARA
126 RF INVOKANA
131 VG STELARA
135 IJ Stelara CD
Unmatched Dataframe:
call_id call_nm src
131 VG STELARA
135 IJ Stelara CD
CodePudding user response:
Use left_anti join with Incremenatl coming first. Left_anti checks to see if the values exist in the second df, they then keep values missing in df.
Incremental.join(parent_dataframe,on='call_nm', how='left_anti').show()
------- ------- ----------
|call_nm|call_id| src|
------- ------- ----------
| IJ| 135|Stelara CD|
| VG| 131| STELARA|
------- ------- ----------