Home > other >  Create a dataframe using left_anti spark/pyspark
Create a dataframe using left_anti spark/pyspark

Time:12-16

I have the following DF

ID NAME VAL
-----------
1  John 5
2  Anna 6
3  Josh 12
4  Paul 10

And I have this DF

ID
--
2
3

I'm doing a left_anti join using pyspark with the below code

test= df.join(
    df_ids,
    on=['ID'],
    how='left_anti'
)

My expected output is:

ID NAME VAL
1 John 5
4 Paul 10

Although, when I run the code above i got an empty dataframe as output. What am I doing wrong?

CodePudding user response:

You do that with the following.

df = (df.join(df_ids, on=df["ID"]==df_ids["ID"], how='left')
  .where(df_ids["ID"].isNull())
  .select(df["*"]))
  • Related