Home > Mobile >  using nested case when and boolean in spark
using nested case when and boolean in spark

Time:06-01

I am trying to use case when with boolean, getting syntax issue cannot resolve "and" after second when, any suggestion what i am missing here?

df.withColumn("Keep_dropped")
   ,when((col("TARGET_ID")===col("SOURCE_ID"))
   .and(col("ingest_sr")===("COS")
   .and(col("timeinseconds")>(lastexectimeTS),
   true)
  .when((col("TARGET_ID")!=col("SOURCE_ID"))
   .and(col("ingest_sr").isNull
   .and(col("timeinseconds").isNull),true)
   .otherwise(false)
)

CodePudding user response:

The problem is that if you take a look at the first condition of your second when, you're using != which is a language operator, and returns a Boolean. and booleans do not have a method called and. you should use !== instead, which is an operator in spark API, like this:

df.withColumn("Keep_dropped")
   .when((col("TARGET_ID") === col("SOURCE_ID"))
   .and(col("ingest_sr") === ("COS")
   .and(col("timeinseconds") > (lastexectimeTS),
   true)
  .when((col("TARGET_ID") !== col("SOURCE_ID")) // note the !== here
   .and(col("ingest_sr").isNull
   .and(col("timeinseconds").isNull),true)
   .otherwise(false)
)

CodePudding user response:

And should be wrapped inside when like when(cond1.and(cond2)). Balance the parantheses. i believe should be like this.

df.withColumn("Keep_dropped"), when((functions.col("TARGET_ID")===functions.col("SOURCE_ID"))
                    .and(col("ingest_sr")===("COS"))
                    .and(col("timeinseconds")>(lastexectimeTS)), true)
  • Related