Home > Enterprise >  Spark explode in Scala - Add exploded column to the row
Spark explode in Scala - Add exploded column to the row

Time:12-13

I have a Spark Dataframe with the following contents:

Name E1 E2 E3
abc 4 5 6

I need the various E columns to become rows in a new column as shown below:

Name value EType
abc 4 E1
abc 5 E2
abc 6 E3

This answer gave me the idea of using explode and I now have the following code:

df.select($"Name", explode(array("E1", "E2", "E3")).as("value"))

The above code gives me the Name and value columns I need, but I still need a way to add in the EType column based on which value in the array passed to explode is being used to populate that particular row.

Output of the above code:

Name value
abc 4
abc 5
abc 6

How can I add the Etype column?

(I am using Spark 2.2 with Scala)

Thanks!

CodePudding user response:

You need to use melt operation here.

Note: Melt functionality is not present in pyspark, you need write that util function.

You can go thought this answer on how to implement melt function How to melt Spark DataFrame?

CodePudding user response:

Instead of exploding just value, you can explode a struct that contains the name of the column and its content, as follows:

import org.apache.spark.sql.functions.{array, col, explode, lit, struct}

val result = df
  .select(
    col("name"), 
    explode(array(
      df.columns.filterNot(_ == "name").map(c => struct(lit(c).as("EType"), col(c).alias("value"))): _*
    ))
  )
  .select("name", "col.*")

With your input you will get as result dataframe:

 ---- ----- ----- 
|name|EType|value|
 ---- ----- ----- 
|abc |E1   |4    |
|abc |E2   |5    |
|abc |E3   |6    |
 ---- ----- ----- 

CodePudding user response:

You can use stack function for this particular case.

df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()

df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()
df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()
 ---- ----- ----- 
|Name|value|EType|
 ---- ----- ----- 
| abc|    4|   E1|
| abc|    5|   E2|
| abc|    6|   E3|
 ---- ----- ----- 
  • Related