Home > Blockchain >  Transpose multiple columns to rows in Java Dataset
Transpose multiple columns to rows in Java Dataset

Time:07-21

enter image description here

The question is in the image

I have a dataset in spark which has values like

org_id, user_id, call_id, audio_bw, video_bw, share_bw

1, 1, 1, 2, 3, 4

org_id, user_id, call_id, type, bw

1, 1, 1, audio, 2

1, 1, 1, video, 3

1, 1, 1, share, 4

I am able to do it using multiple SQL queries, bit wanted to do it using code, if possible using flatMap

CodePudding user response:

Assuming that your dataset is called data, this:

data = data.select(
  col("org_id"), 
  col("user_id"), 
  col("call_id"),
  expr("stack(3, 'audio', audio_bw, 'video', video_bw, 'share', share_bw) as (type, bw)")
)

does what you want, the final output table:

 ------ ------- ------- ----- --- 
|org_id|user_id|call_id| type| bw|
 ------ ------- ------- ----- --- 
|     1|      1|      1|audio|  2|
|     1|      1|      1|video|  3|
|     1|      1|      1|share|  4|
 ------ ------- ------- ----- --- 

More about stack can be found here!

EDIT:

In case you have multiple columns to stack:

data
  .select(
    col("org_id"),
    col("user_id"),
    col("call_id"),
    // concatinate your data
    expr("concat_ws(' - ', audio_rx_bw, audio_tx_bw)").as("audio"),
    expr("concat_ws(' - ', video_rx_bw, video_tx_bw)").as("video"),
    expr("concat_ws(' - ', share_rx_bw, share_tx_bw)").as("share")
  )
  .select(
    col("org_id"),
    col("user_id"),
    col("call_id"),
    // stack again, but this time for the concatenated values
    expr("stack(3, 'audio', audio, 'video', video, 'share', share) as (type, data)")
  )
  // once done with stacking, get the data in your desired format
  .withColumn("rx_bw", split(col("data"), " - ")(0))
  .withColumn("tx_bw", split(col("data"), " - ")(1))
  .drop("data")

Input:

 ------ ------- ------- ----------- ----------- ----------- ----------- ----------- ----------- 
|org_id|user_id|call_id|audio_rx_bw|audio_tx_bw|video_rx_bw|video_tx_bw|share_rx_bw|share_tx_bw|
 ------ ------- ------- ----------- ----------- ----------- ----------- ----------- ----------- 
|     1|      1|      1|          2|          3|          4|          2|          4|          5|
 ------ ------- ------- ----------- ----------- ----------- ----------- ----------- ----------- 

Output:

 ------ ------- ------- ----- ----- ----- 
|org_id|user_id|call_id| type|rx_bw|tx_bw|
 ------ ------- ------- ----- ----- ----- 
|     1|      1|      1|audio|    2|    3|
|     1|      1|      1|video|    4|    2|
|     1|      1|      1|share|    4|    5|
 ------ ------- ------- ----- ----- ----- 

for this input:

  • Related