Home > Blockchain >  Apache spark : How deep is the comparison of rows in RDD or DF
Apache spark : How deep is the comparison of rows in RDD or DF

Time:06-30

I want to understand the behavior of DF.intersect().

so the question came to mind, especially when we have complex Rows having complex fields. (deep tree)

CodePudding user response:

If we are talking about dataframe intersect transformation, then, according to the Dataset documentation and source, the comparison is done directly on the encoded content. Which is as deep as it can possibly go.

def intersect(other: Dataset[T]): Dataset[T]
Returns a new Dataset containing rows only in both this Dataset and another Dataset. This is equivalent to INTERSECT in SQL.

Since 1.6.0

Note: Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

  • Related