When I trying to write:
F.when(
F.col("COL1").isin("VAL1", "VAL2") & (F.col("COL2") == 1),
F.col("SOMECOL")
)
I have an error
AnalysisException("cannot resolve '('VAL1' AND (COL2 = 1))' due to data type mismatch: differing types in '('VAL1' AND (COL2.
How can I replace isin method so that it will work?
CodePudding user response:
You can use F.array_contains
:
F.when(
F.array_contains(F.array("VAL1", "VAL2"), F.col("COL1")) & (F.col("COL2") == 1),
F.col("SOMECOL")
)
CodePudding user response:
Yes, in PySpark, you can use the isin function on a column and it returns a column of boolean values indicating whether the values in the input column are in the specified set of values.
For example:
from pyspark.sql.functions import col, isin
df = spark.createDataFrame([(1, "John"), (2, "Jane"), (3, "Jim")], ["id", "name"]) df.where(col("name").isin("John", "Jane")).show()
This will return a dataframe with only the rows where name is either "John" or "Jane".