Home > database >  How to use when() .otherwise function in Spark with multiple conditions
How to use when() .otherwise function in Spark with multiple conditions

Time:10-19

This is my first post so let me know if I need to give more details.

I am trying to create a boolean column, "immediate", that shows true when at least on of the columns has some data in it. If all are null then the column should be false. I am using the when() .otherwise function in spark but I'm not getting the result I would expect.

Below is the code I'm using:

val evaluation = evaluation_raw
.withColumn("immediate", 
    when(col("intended_outcome_review").isNull 
    && col("outcome").isNull 
    && col("impact").isNull 
    && col("impact_self").isNull 
    && col("next_step").isNull, 
    lit(false))
    .otherwise(lit(true)))
.select(
    col("id"),
    col("intended_outcome_review"),
    col("outcome"),
    col("impact"),
    col("impact_self"),
    col("next_step"),
    col("immediate"))

Desired outcome:

 -------- ------------------------ ------------- ------- ------------ ---------- ---------- 
|id      |intended_outcome_review |outcome      |impact |impact_self |next_step |immediate |
 -------- ------------------------ ------------- ------- ------------ ---------- ---------- 
|1568    |null                    |null         |4      |3           |null      |true      |
|1569    |null                    |null         |null   |null        |null      |false     |
|1570    |null                    |null         |null   |null        |null      |false     |
|1571    |1                       |improved coms|3      |3           |email prof|true      |
 -------- ------------------------ ------------- ------- ------------ ---------- ---------- 

Actual outcome:

 -------- ------------------------ ------------- ------- ------------ ---------- ---------- 
|id      |intended_outcome_review |outcome      |impact |impact_self |next_step |immediate |
 -------- ------------------------ ------------- ------- ------------ ---------- ---------- 
|1568    |null                    |null         |4      |3           |null      |true      |
|1569    |null                    |null         |null   |null        |null      |true      |
|1570    |null                    |null         |null   |null        |null      |false     |
|1571    |1                       |improved coms|3      |3           |email prof|true      |
 -------- ------------------------ ------------- ------- ------------ ---------- ---------- 

If anyone knows what I may be doing wrong please let me know. Thanks!

CodePudding user response:

You can use a trick and cast column.isNull() to in int and calculate sum of them. if the sum is above 0 then it's true.

.withColumn(
    'immediate',
    (
        F.col('intended_outcome_review').isNull().cast('int')  
        F.col('outcome').isNull().cast('int')  
        F.col('impact').isNull().cast('int')  
        F.col('next_step').isNull().cast('int')
    ) != 0
)

CodePudding user response:

Turns out some of the columns are converted from Null to "" when other parts of the form are filled out.

Answer below considers empty strings and Null values:

.withColumn("immediate", 
    when((col("intended_outcome_review").isNull || col("intended_outcome_review") ==="")
    && (col("outcome").isNull || col("outcome") === "")
    && (col("impact").isNull || col("outcome") === "")
    && (col("impact_self").isNull || col("impact_self") === "")
    && (col("next_step").isNull || col("next_step") === ""),
    lit(false))
    .otherwise(lit(true)))
  • Related