This is my first post so let me know if I need to give more details.
I am trying to create a boolean column, "immediate", that shows true when at least on of the columns has some data in it. If all are null then the column should be false. I am using the when() .otherwise function in spark but I'm not getting the result I would expect.
Below is the code I'm using:
val evaluation = evaluation_raw
.withColumn("immediate",
when(col("intended_outcome_review").isNull
&& col("outcome").isNull
&& col("impact").isNull
&& col("impact_self").isNull
&& col("next_step").isNull,
lit(false))
.otherwise(lit(true)))
.select(
col("id"),
col("intended_outcome_review"),
col("outcome"),
col("impact"),
col("impact_self"),
col("next_step"),
col("immediate"))
Desired outcome:
-------- ------------------------ ------------- ------- ------------ ---------- ----------
|id |intended_outcome_review |outcome |impact |impact_self |next_step |immediate |
-------- ------------------------ ------------- ------- ------------ ---------- ----------
|1568 |null |null |4 |3 |null |true |
|1569 |null |null |null |null |null |false |
|1570 |null |null |null |null |null |false |
|1571 |1 |improved coms|3 |3 |email prof|true |
-------- ------------------------ ------------- ------- ------------ ---------- ----------
Actual outcome:
-------- ------------------------ ------------- ------- ------------ ---------- ----------
|id |intended_outcome_review |outcome |impact |impact_self |next_step |immediate |
-------- ------------------------ ------------- ------- ------------ ---------- ----------
|1568 |null |null |4 |3 |null |true |
|1569 |null |null |null |null |null |true |
|1570 |null |null |null |null |null |false |
|1571 |1 |improved coms|3 |3 |email prof|true |
-------- ------------------------ ------------- ------- ------------ ---------- ----------
If anyone knows what I may be doing wrong please let me know. Thanks!
CodePudding user response:
You can use a trick and cast column.isNull() to in int and calculate sum of them. if the sum is above 0 then it's true.
.withColumn(
'immediate',
(
F.col('intended_outcome_review').isNull().cast('int')
F.col('outcome').isNull().cast('int')
F.col('impact').isNull().cast('int')
F.col('next_step').isNull().cast('int')
) != 0
)
CodePudding user response:
Turns out some of the columns are converted from Null to "" when other parts of the form are filled out.
Answer below considers empty strings and Null values:
.withColumn("immediate",
when((col("intended_outcome_review").isNull || col("intended_outcome_review") ==="")
&& (col("outcome").isNull || col("outcome") === "")
&& (col("impact").isNull || col("outcome") === "")
&& (col("impact_self").isNull || col("impact_self") === "")
&& (col("next_step").isNull || col("next_step") === ""),
lit(false))
.otherwise(lit(true)))