Home > Software engineering >  How can i get output as below jn spark scala
How can i get output as below jn spark scala

Time:11-17

I have data like below.

A B C D
1 A Day D1
1 A Tim 1am
1 A Tim 3am

Need to create like this

A B Day Tim1 Tim2
1 A D1 1am 3am

Can you help how to get in spark scala

CodePudding user response:

You can add the row numbers for the duplicates first and then do the pivot.

import org.apache.spark.sql.expressions.Window

val w1 = Window.partitionBy("A", "B", "C").orderBy("D")
val w2 = Window.partitionBy("A", "B", "C")

val df1 = df0.withColumn("row_num", row_number().over(w1)).withColumn("max_num", max("row_num").over(w2))
df1.show(false)

// --- --- --- --- ------- ------- 
//|A  |B  |C  |D  |row_num|max_num|
// --- --- --- --- ------- ------- 
//|1  |A  |Tim|1am|1      |2      |
//|1  |A  |Tim|3am|2      |2      |
//|1  |A  |Day|D1 |1      |1      |
// --- --- --- --- ------- ------- 


val df2 = df1.withColumn("C", expr("if(max_num != 1, concat(C, row_num), C)"))
df2.show(false)

// --- --- ---- --- ------- ------- 
//|A  |B  |C   |D  |row_num|max_num|
// --- --- ---- --- ------- ------- 
//|1  |A  |Tim1|1am|1      |2      |
//|1  |A  |Tim2|3am|2      |2      |
//|1  |A  |Day |D1 |1      |1      |
// --- --- ---- --- ------- ------- 


val df3 = df2.groupBy("A", "B").pivot("C").agg(first("D"))
df3.show(false)

// --- --- --- ---- ---- 
//|A  |B  |Day|Tim1|Tim2|
// --- --- --- ---- ---- 
//|1  |A  |D1 |1am |3am |
// --- --- --- ---- ---- 
  • Related