Home > Software design >  Comparing two columns in a dataframes in pyspark
Comparing two columns in a dataframes in pyspark

Time:10-01

I have two columns that are similar to each other in two data frames

I want to compare these columns and return those values which do not match with each other example:

df_1["detail"]= ["X25", "i20", "Sunny120", "A22" ]
df_2["temp_detail"]= ["i20", "A22", "sunnY120", "X 25"]

Expected output:

X25 
Sunny120

These values are not same there is a spacing error and a case error

Can anyone kindly please help me with this code in pyspark?

CodePudding user response:

You can use a left_anti join for that.

df_1.join(df_2, df_1.detail === df_2.temp_detail, "left_anti").select("detail").show()

df_2.join(df_1, df_1.detail === df_2.temp_detail, "left_anti").select("temp_detail").show()
  • Related