I'm having some trouble refactoring a spark dataframe to not use expr but instead use dataframe filters and when conditionals.
My code is this:
outDF = outDF.withColumn("MAIN_TYPE", expr
("case when 'TYPE_A' in (GROUP_A,GROUP_B,GROUP_C,GROUP_D) then 'TYPE_A'"
"when 'TYPE_B' in (GROUP_A,GROUP_B,GROUP_C,GROUP_D) then 'TYPE_B'"
"when 'TYPE_C' in (GROUP_A,GROUP_B,GROUP_C,GROUP_D) then 'TYPE_C'"
"when 'TYPE_D' in (GROUP_A,GROUP_B,GROUP_C,GROUP_D) then 'TYPE_D' else '0' end")
.cast(StringType))
The only solution that I could think of, so far is a series of individual .when().otherwise()
chains, but that would require mXn lines, where m the number of Types and n the number of Groups that I need.
Is there any better way to do this kind of operation?
Thank you very much for your time!
CodePudding user response:
So, this is how I worked this out, in case anyone is interested:
I used a helper column for the groups which I later dropped.
This is how this worked:
outDF = outDF.withColumn("Helper_Column", concat(col("Group_A"),col("Group_B"),
col("Group_C"),col("Group_D")))
outDF = outDF.withColumn("MAIN_TYPE", when(col("Helper_Column").like("%Type_A%"),"Type_A").otherwise(
when(col("Helper_Column").like("%Type_B%"),"Type_B").otherwise(
when(col("Helper_Column").like("%Type_C%"),"Type_C").otherwise(
when(col("Helper_Column").like("%Type_D%"),"Type_D").otherwise(lit("0")
)))))
outDF = outDF.drop("Helper_Column")
Hope this helps someone.