example: i have a dataframe like
i want to filter multiple condition with negation firstname == "James" & lastname == "Smith" or firstname == "Robert" & lastname == "Williams"
my required output should be i am using something like this but its not working
df = df.filter(~(df.firstname == "James") & (df.lastname == "Smith")|~(df.firstname == "Robert") & (df.lastname == "Williams"))
CodePudding user response:
You will have to apply the negation on the entire condition.
data = [("James","","Smith","36636","M",3000),
("Michael","Rose","jim","40288","M",4000),
("Robert","","Williams","42114","M",4000),
("Maria","Anne","Jones","39192","F",4000),
("Jen","Mary","Brown","60563","F",-1)]
df = spark.createDataFrame(data, ("firstname", "middlename", "lastname", "id", "gender", "salary", ))
(df.filter(~(((df.firstname == "James") & (df.lastname == "Smith")) |
((df.firstname == "Robert") & (df.lastname == "Williams"))
)
)
.show())
Output
--------- ---------- -------- ----- ------ ------
|firstname|middlename|lastname| id|gender|salary|
--------- ---------- -------- ----- ------ ------
| Michael| Rose| jim|40288| M| 4000|
| Maria| Anne| Jones|39192| F| 4000|
| Jen| Mary| Brown|60563| F| -1|
--------- ---------- -------- ----- ------ ------
CodePudding user response:
here OR condition is not giving the right output. we have to change it to AND
df_new = (df
.filter(~((F.col("firstname") == "James") & (F.col("lastname") == "Smith"))
& ~((F.col("firstname") == "Robert") & (F.col("lastname") == "Williams"))
)
)
results in below -
CodePudding user response:
Negation of A|B
is (~A)&(~B)
.
So try this:
df = df.filter((~(firstname == "James" & lastname == "Smith")) & (~(firstname == "Robert" & lastname == "Williams")))
Because the negation of C&D
is (~C)|(~D)
, you can further simplify your filtering condition as
df = df.filter((firstname != "James" | lastname != "Smith") & (firstname != "Robert" | lastname != "Williams"))