The question is in the image
I have a dataset in spark which has values like
org_id, user_id, call_id, audio_bw, video_bw, share_bw
1, 1, 1, 2, 3, 4
org_id, user_id, call_id, type, bw
1, 1, 1, audio, 2
1, 1, 1, video, 3
1, 1, 1, share, 4
I am able to do it using multiple SQL queries, bit wanted to do it using code, if possible using flatMap
CodePudding user response:
Assuming that your dataset is called data
, this:
data = data.select(
col("org_id"),
col("user_id"),
col("call_id"),
expr("stack(3, 'audio', audio_bw, 'video', video_bw, 'share', share_bw) as (type, bw)")
)
does what you want, the final output table:
------ ------- ------- ----- ---
|org_id|user_id|call_id| type| bw|
------ ------- ------- ----- ---
| 1| 1| 1|audio| 2|
| 1| 1| 1|video| 3|
| 1| 1| 1|share| 4|
------ ------- ------- ----- ---
More about stack
can be found here!
EDIT:
In case you have multiple columns to stack:
data
.select(
col("org_id"),
col("user_id"),
col("call_id"),
// concatinate your data
expr("concat_ws(' - ', audio_rx_bw, audio_tx_bw)").as("audio"),
expr("concat_ws(' - ', video_rx_bw, video_tx_bw)").as("video"),
expr("concat_ws(' - ', share_rx_bw, share_tx_bw)").as("share")
)
.select(
col("org_id"),
col("user_id"),
col("call_id"),
// stack again, but this time for the concatenated values
expr("stack(3, 'audio', audio, 'video', video, 'share', share) as (type, data)")
)
// once done with stacking, get the data in your desired format
.withColumn("rx_bw", split(col("data"), " - ")(0))
.withColumn("tx_bw", split(col("data"), " - ")(1))
.drop("data")
Input:
------ ------- ------- ----------- ----------- ----------- ----------- ----------- -----------
|org_id|user_id|call_id|audio_rx_bw|audio_tx_bw|video_rx_bw|video_tx_bw|share_rx_bw|share_tx_bw|
------ ------- ------- ----------- ----------- ----------- ----------- ----------- -----------
| 1| 1| 1| 2| 3| 4| 2| 4| 5|
------ ------- ------- ----------- ----------- ----------- ----------- ----------- -----------
Output:
------ ------- ------- ----- ----- -----
|org_id|user_id|call_id| type|rx_bw|tx_bw|
------ ------- ------- ----- ----- -----
| 1| 1| 1|audio| 2| 3|
| 1| 1| 1|video| 4| 2|
| 1| 1| 1|share| 4| 5|
------ ------- ------- ----- ----- -----
for this input: