I want to understand the behavior of DF.intersect().
so the question came to mind, especially when we have complex Rows having complex fields. (deep tree)
CodePudding user response:
If we are talking about dataframe intersect
transformation, then, according to the Dataset documentation and source, the comparison is done directly on the encoded content. Which is as deep as it can possibly go.
def intersect(other: Dataset[T]): Dataset[T]
Returns a new Dataset containing rows only in both this Dataset and another Dataset. This is equivalent to INTERSECT in SQL.Since 1.6.0
Note: Equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom
equals
function defined on T.