I am trying to use case when with boolean, getting syntax issue cannot resolve "and" after second when, any suggestion what i am missing here?
df.withColumn("Keep_dropped")
,when((col("TARGET_ID")===col("SOURCE_ID"))
.and(col("ingest_sr")===("COS")
.and(col("timeinseconds")>(lastexectimeTS),
true)
.when((col("TARGET_ID")!=col("SOURCE_ID"))
.and(col("ingest_sr").isNull
.and(col("timeinseconds").isNull),true)
.otherwise(false)
)
CodePudding user response:
The problem is that if you take a look at the first condition of your second when
, you're using !=
which is a language operator, and returns a Boolean
. and booleans do not have a method called and
. you should use !==
instead, which is an operator in spark API, like this:
df.withColumn("Keep_dropped")
.when((col("TARGET_ID") === col("SOURCE_ID"))
.and(col("ingest_sr") === ("COS")
.and(col("timeinseconds") > (lastexectimeTS),
true)
.when((col("TARGET_ID") !== col("SOURCE_ID")) // note the !== here
.and(col("ingest_sr").isNull
.and(col("timeinseconds").isNull),true)
.otherwise(false)
)
CodePudding user response:
And should be wrapped inside when like when(cond1.and(cond2)). Balance the parantheses. i believe should be like this.
df.withColumn("Keep_dropped"), when((functions.col("TARGET_ID")===functions.col("SOURCE_ID"))
.and(col("ingest_sr")===("COS"))
.and(col("timeinseconds")>(lastexectimeTS)), true)