Home > Net >  Spark Explode a row containing multiple metrics using Map
Spark Explode a row containing multiple metrics using Map

Time:08-17

I am using Spark with Java and have a dataframe like this:

id | sent  | delivered | opened
--------------------------------
1  | 5     | 3         | 2
2  | 11    | 9         | 4

I want to get something like:

id  | metric_name | metric_value
--------------------------------
1   | sent        |  5
1   | delivered   |  3
1   | opened      |  2
2   | sent        |  11
2   | delivered   |  9
2   | opened      |  4

CodePudding user response:

This does the trick:

df = df.selectExpr(
  "id",
  "stack(3, 'sent', sent, 'delivered', delivered, 'opened', opened) as (metric_name, metric_value)"
)

Final output:

 --- ----------- ------------ 
|id |metric_name|metric_value|
 --- ----------- ------------ 
|1  |sent       |5           |
|1  |delivered  |3           |
|1  |opened     |2           |
|2  |sent       |11          |
|2  |delivered  |9           |
|2  |opened     |4           |
 --- ----------- ------------ 

Good luck!

CodePudding user response:

Create a struct columns combining column names and values of the columns you want to melt. Explode using inline

 df1 =df.withColumn('tab',F.array(*[F.struct(F.lit(x).alias('metric_name'), F.col(x).alias('metric_value ')) for x in df.columns if x != 'id'])).selectExpr('id','inline(tab)')

df1.show(truncate=False)

 --- ----------- ------------- 
|id |metric_name|metric_value |
 --- ----------- ------------- 
|1  |sent       |5            |
|1  |delivered  |3            |
|1  |opened     |2            |
|2  |sent       |11           |
|2  |delivered  |9            |
|2  |opened     |4            |
 --- ----------- ------------- 
  • Related