Home > Enterprise >  Filter rows that have value from the list in PySpark
Filter rows that have value from the list in PySpark

Time:11-02

I have a list of values:

my_list = ["temp1","temp2", "temp10", "temp15"]

I am trying to delete rows from the column "value" that has the values from this list.

Code I tried:

res = res.filter((res.value == 'temp1') | (res.value == 'temp2') |
                 (res.value == 'temp10') | (res.value == 'temp15'))

But is there any other way that I can directly loop in the list and filter? (because my list has 30 elements).

CodePudding user response:

Use isin:

res = res.filter(res.value.isin(my_list))

Example:

res = spark.createDataFrame([('temp1',), ('x',)], ['value'])
res.show()
#  ----- 
# |value|
#  ----- 
# |temp1|
# |    x|
#  ----- 

my_list = ["temp1", "temp2", "temp10", "temp15"]
res = res.filter(res.value.isin(my_list))

res.show()
#  ----- 
# |value|
#  ----- 
# |temp1|
#  ----- 
  • Related