I have two columns that are similar to each other in two data frames
I want to compare these columns and return those values which do not match with each other example:
df_1["detail"]= ["X25", "i20", "Sunny120", "A22" ]
df_2["temp_detail"]= ["i20", "A22", "sunnY120", "X 25"]
Expected output:
X25
Sunny120
These values are not same there is a spacing error and a case error
Can anyone kindly please help me with this code in pyspark?
CodePudding user response:
You can use a left_anti join for that.
df_1.join(df_2, df_1.detail === df_2.temp_detail, "left_anti").select("detail").show()
df_2.join(df_1, df_1.detail === df_2.temp_detail, "left_anti").select("temp_detail").show()