I have the following DF
ID NAME VAL
-----------
1 John 5
2 Anna 6
3 Josh 12
4 Paul 10
And I have this DF
ID
--
2
3
I'm doing a left_anti join using pyspark with the below code
test= df.join(
df_ids,
on=['ID'],
how='left_anti'
)
My expected output is:
ID NAME VAL
1 John 5
4 Paul 10
Although, when I run the code above i got an empty dataframe as output. What am I doing wrong?
CodePudding user response:
You do that with the following.
df = (df.join(df_ids, on=df["ID"]==df_ids["ID"], how='left')
.where(df_ids["ID"].isNull())
.select(df["*"]))