Home > Enterprise >  Zero joins with null in the null-safe join
Zero joins with null in the null-safe join

Time:12-17

I have noticed that 0 joins with null when using null-safe join (eqNullSafe).

df1 = spark.createDataFrame([(1, ), (None, )], ['df1_id'])
df2 = spark.createDataFrame([(None, ), (0, )], ['df2_id'])

df1.join(df2, df1.df1_id.eqNullSafe(df2.df2_id), 'right').show()
# ------ ------ 
#|df1_id|df2_id|
# ------ ------ 
#|  null|     0|
#|  null|  null|
# ------ ------ 

df2.join(df1, df1.df1_id.eqNullSafe(df2.df2_id), 'left').show()
# ------ ------ 
#|df2_id|df1_id|
# ------ ------ 
#|     0|  null|
#|  null|  null|
# ------ ------ 

How do I make null join only with null?

CodePudding user response:

You need to do inner join here

df1.join(df2, df1.df1_id.eqNullSafe(df2.df2_id), 'inner').show()

Right now for 0 in right and there is no match in the left df, and we are doing right join that's why pyspark is keeping 0 in right df and it is becoming null in df1_id.

  • Related