I have the below Dataframe with each column containing a list of values of the same size
-------- -------------------- -------------------- -------------------- --------------
|Country_1| |Country_2| |Country_3| |Country_4|
-------------------- -------------------- -------------------- --------------------
|[1, 2, 3, 4, 5, 6 ] | [x1, x2, x3, x4, x5, x6 ]|[y1, y2, y3, y4, y5, y6 ] |[v1, v2, v3, v4, v5, v6 ]
I need to convert each element list into a row so that to further elaborate, from what I have seen around like this post I should use explode
function to end up somehow as below:
Country_1 Country_2 Country_3 Country_4
1 x1 y1 v1
2 x2 y2 v2
3 x3 y3 v3
4 x4 y4 v4
5 x5 y5 v5
6 x6 y6 v6
I have tried the below code but haven't made it work so far.
data.withColumn("Country_1Country_2", F.arrays_zip("Country_1","Country_2")).select(*, F.explode("Country_1Country_2").alias("tCountry_1Country_2")).select(*, "tCountry_1Country_2.Country_1", col("Country_1Country_2.Country_2")).show()
CodePudding user response:
# This is not part of the solution, just creation of the data sample
# df = spark.sql("select stack(1, array(1, 2, 3, 4, 5, 6) ,array('x1', 'x2', 'x3', 'x4', 'x5', 'x6') ,array('y1', 'y2', 'y3', 'y4', 'y5', 'y6') ,array('v1', 'v2', 'v3', 'v4', 'v5', 'v6')) as (Country_1, Country_2,Country_3,Country_4)")
df.selectExpr('inline(arrays_zip(*))').show()
--------- --------- --------- ---------
|Country_1|Country_2|Country_3|Country_4|
--------- --------- --------- ---------
| 1| x1| y1| v1|
| 2| x2| y2| v2|
| 3| x3| y3| v3|
| 4| x4| y4| v4|
| 5| x5| y5| v5|
| 6| x6| y6| v6|
--------- --------- --------- ---------