I am using Spark with Java and have a dataframe like this:
id | sent | delivered | opened
--------------------------------
1 | 5 | 3 | 2
2 | 11 | 9 | 4
I want to get something like:
id | metric_name | metric_value
--------------------------------
1 | sent | 5
1 | delivered | 3
1 | opened | 2
2 | sent | 11
2 | delivered | 9
2 | opened | 4
CodePudding user response:
This does the trick:
df = df.selectExpr(
"id",
"stack(3, 'sent', sent, 'delivered', delivered, 'opened', opened) as (metric_name, metric_value)"
)
Final output:
--- ----------- ------------
|id |metric_name|metric_value|
--- ----------- ------------
|1 |sent |5 |
|1 |delivered |3 |
|1 |opened |2 |
|2 |sent |11 |
|2 |delivered |9 |
|2 |opened |4 |
--- ----------- ------------
Good luck!
CodePudding user response:
Create a struct columns combining column names and values of the columns you want to melt. Explode using inline
df1 =df.withColumn('tab',F.array(*[F.struct(F.lit(x).alias('metric_name'), F.col(x).alias('metric_value ')) for x in df.columns if x != 'id'])).selectExpr('id','inline(tab)')
df1.show(truncate=False)
--- ----------- -------------
|id |metric_name|metric_value |
--- ----------- -------------
|1 |sent |5 |
|1 |delivered |3 |
|1 |opened |2 |
|2 |sent |11 |
|2 |delivered |9 |
|2 |opened |4 |
--- ----------- -------------