Home > Net >  Is there any pySPark function simmilar to isin but returns bool?
Is there any pySPark function simmilar to isin but returns bool?

Time:02-05

When I trying to write:

F.when(
    F.col("COL1").isin("VAL1", "VAL2") & (F.col("COL2") == 1),
    F.col("SOMECOL")
)

I have an error

AnalysisException("cannot resolve '('VAL1' AND (COL2 = 1))' due to data type mismatch: differing types in '('VAL1' AND (COL2.

How can I replace isin method so that it will work?

CodePudding user response:

You can use F.array_contains:

F.when(
    F.array_contains(F.array("VAL1", "VAL2"), F.col("COL1")) & (F.col("COL2") == 1),
    F.col("SOMECOL")
)

CodePudding user response:

Yes, in PySpark, you can use the isin function on a column and it returns a column of boolean values indicating whether the values in the input column are in the specified set of values.

For example:

from pyspark.sql.functions import col, isin

df = spark.createDataFrame([(1, "John"), (2, "Jane"), (3, "Jim")], ["id", "name"]) df.where(col("name").isin("John", "Jane")).show()

This will return a dataframe with only the rows where name is either "John" or "Jane".

  • Related