Home > Software engineering >  Using 'where' instead of 'expr' when filtering for values in multiple columns in
Using 'where' instead of 'expr' when filtering for values in multiple columns in

Time:11-11

I'm having some trouble refactoring a spark dataframe to not use expr but instead use dataframe filters and when conditionals.

My code is this:

 outDF = outDF.withColumn("MAIN_TYPE", expr
 ("case when 'TYPE_A' in (GROUP_A,GROUP_B,GROUP_C,GROUP_D) then 'TYPE_A'"  
  "when 'TYPE_B' in (GROUP_A,GROUP_B,GROUP_C,GROUP_D) then 'TYPE_B'"  
  "when 'TYPE_C' in (GROUP_A,GROUP_B,GROUP_C,GROUP_D) then 'TYPE_C'"  
  "when 'TYPE_D' in (GROUP_A,GROUP_B,GROUP_C,GROUP_D) then 'TYPE_D' else '0' end")
  .cast(StringType))

The only solution that I could think of, so far is a series of individual .when().otherwise() chains, but that would require mXn lines, where m the number of Types and n the number of Groups that I need.

Is there any better way to do this kind of operation?

Thank you very much for your time!

CodePudding user response:

So, this is how I worked this out, in case anyone is interested:

I used a helper column for the groups which I later dropped.

This is how this worked:

    outDF = outDF.withColumn("Helper_Column", concat(col("Group_A"),col("Group_B"),
      col("Group_C"),col("Group_D")))
    outDF = outDF.withColumn("MAIN_TYPE", when(col("Helper_Column").like("%Type_A%"),"Type_A").otherwise(
      when(col("Helper_Column").like("%Type_B%"),"Type_B").otherwise(
        when(col("Helper_Column").like("%Type_C%"),"Type_C").otherwise(
        when(col("Helper_Column").like("%Type_D%"),"Type_D").otherwise(lit("0")
    )))))
    outDF = outDF.drop("Helper_Column")

Hope this helps someone.

  • Related