Home > Blockchain >  How to negate multiple filter condition at same time
How to negate multiple filter condition at same time

Time:12-24

example: i have a dataframe like enter image description here

i want to filter multiple condition with negation firstname == "James" & lastname == "Smith" or firstname == "Robert" & lastname == "Williams"

my required output should be enter image description here i am using something like this but its not working

df = df.filter(~(df.firstname == "James") & (df.lastname == "Smith")|~(df.firstname == "Robert") & (df.lastname == "Williams"))

CodePudding user response:

You will have to apply the negation on the entire condition.


data = [("James","","Smith","36636","M",3000),
    ("Michael","Rose","jim","40288","M",4000),
    ("Robert","","Williams","42114","M",4000),
    ("Maria","Anne","Jones","39192","F",4000),
    ("Jen","Mary","Brown","60563","F",-1)]

df = spark.createDataFrame(data, ("firstname", "middlename", "lastname", "id", "gender", "salary", ))

(df.filter(~(((df.firstname == "James") & (df.lastname == "Smith")) |
             ((df.firstname == "Robert") & (df.lastname == "Williams"))
            )
          )
   .show())

Output

 --------- ---------- -------- ----- ------ ------ 
|firstname|middlename|lastname|   id|gender|salary|
 --------- ---------- -------- ----- ------ ------ 
|  Michael|      Rose|     jim|40288|     M|  4000|
|    Maria|      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|60563|     F|    -1|
 --------- ---------- -------- ----- ------ ------ 

CodePudding user response:

here OR condition is not giving the right output. we have to change it to AND

df_new = (df
          .filter(~((F.col("firstname") == "James") & (F.col("lastname") == "Smith"))
                  & ~((F.col("firstname") == "Robert") & (F.col("lastname") == "Williams"))
                 )
         )

results in below -

Output

CodePudding user response:

Negation of A|B is (~A)&(~B).

So try this:

df = df.filter((~(firstname == "James" & lastname == "Smith")) & (~(firstname == "Robert" & lastname == "Williams")))

Because the negation of C&D is (~C)|(~D), you can further simplify your filtering condition as

df = df.filter((firstname != "James" | lastname != "Smith") & (firstname != "Robert" | lastname != "Williams"))
  • Related