I have a dataframe with a map:
sdf = spark.createDataFrame(
[
(1, {'Kira':25,'Lilly':15}),
(2, {'Tom':14}),
],
["id", "label"]
)
--- -------------------------
|id |label |
--- -------------------------
|1 |{Lilly -> 15, Kira -> 25}|
|2 |{Tom -> 14} |
--- -------------------------
And I want to put the keys in one column and the values in another, like this:
--- ----- ---
|id |name |age|
--- ----- ---
|1 |Kira |25 |
|1 |Lilly|15 |
|2 |Tom |14 |
--- ----- ---
CodePudding user response:
Long hand. Use map collection functions to create name and age colunms. Leverage inline function to explode
sdf.withColumn('name',map_keys('label')).withColumn('age', map_values('label')).selectExpr('id','inline(arrays_zip(name,age))').show()
--- ----- ---
| id| name|age|
--- ----- ---
| 1|Lilly| 15|
| 1| Kira| 25|
| 2| Tom| 14|
--- ----- ---