I need to pivot more than one column in spark scala like the exmple below : Input Dataframe Output Dataframe
- RSECT between 1 to 6 Thank you
Tried different pivot but without succesful result
CodePudding user response:
I am not sure about a solution through pivot, but here is another outside of the box solution.
Assume data
contains:
------ ----- ----
|Rax |RSECT|rste|
------ ----- ----
|CUST1 |1 |aa |
|CUST2 |2 |aa |
|CUST3 |3 |aa |
|CUST4 |4 |aa |
|CUST5 |5 |aa |
|CUST6 |6 |aa |
|CUST7 |1 |bb |
|CUST8 |2 |bb |
|CUST9 |3 |bb |
|CUST10|4 |bb |
|CUST11|5 |bb |
|CUST12|6 |bb |
------ ----- ----
We can groupBy
, collect_list
and finally use selectExpr
to extract values, as:
data
.groupBy("rste")
.agg(collect_list(array("Rax", "RSECT")).as("array"))
.selectExpr(
Array("rste") expressions: _*
)
where expressions
can be created as:
val nrElements = 6 // or you can aggregate and collect this to be your maximum number, whatever you need
var expressions = Array[String]()
for (i <- 0 until nrElements) {
expressions = expressions :
s"array[$i][0] as Rax${i 1}" :
s"array[$i][1] as RSECT${i 1}"
}
Above statement would generate this output:
Array(array[0][0] as Rax1, array[0][1] as RSECT1, array[1][0] as Rax2, array[1][1] as RSECT2, array[2][0] as Rax3, array[2][1] as RSECT3, array[3][0] as Rax4, array[3][1] as RSECT4, array[4][0] as Rax5, array[4][1] as RSECT5, array[5][0] as Rax6, array[5][1] as RSECT6)
which we can use as an SQL expression.
The complete output looks like:
---- ----- ------ ----- ------ ----- ------ ------ ------ ------ ------ ------ ------
|rste|Rax1 |RSECT1|Rax2 |RSECT2|Rax3 |RSECT3|Rax4 |RSECT4|Rax5 |RSECT5|Rax6 |RSECT6|
---- ----- ------ ----- ------ ----- ------ ------ ------ ------ ------ ------ ------
|aa |CUST1|1 |CUST2|2 |CUST3|3 |CUST4 |4 |CUST5 |5 |CUST6 |6 |
|bb |CUST7|1 |CUST8|2 |CUST9|3 |CUST10|4 |CUST11|5 |CUST12|6 |
---- ----- ------ ----- ------ ----- ------ ------ ------ ------ ------ ------ ------
Good luck!