Home > Blockchain >  Split the map into two columns pyspark
Split the map into two columns pyspark

Time:06-15

I have a dataframe with a map:

sdf = spark.createDataFrame(
    [
        (1,  {'Kira':25,'Lilly':15}),  
        (2, {'Tom':14}),
    ],
    ["id", "label"]  
)
 --- ------------------------- 
|id |label                    |
 --- ------------------------- 
|1  |{Lilly -> 15, Kira -> 25}|
|2  |{Tom -> 14}              |
 --- ------------------------- 

And I want to put the keys in one column and the values in another, like this:

 --- ----- --- 
|id |name |age|
 --- ----- --- 
|1  |Kira |25 |
|1  |Lilly|15 |
|2  |Tom  |14 |
 --- ----- --- 

CodePudding user response:

Long hand. Use map collection functions to create name and age colunms. Leverage inline function to explode

sdf.withColumn('name',map_keys('label')).withColumn('age', map_values('label')).selectExpr('id','inline(arrays_zip(name,age))').show()

 --- ----- --- 
| id| name|age|
 --- ----- --- 
|  1|Lilly| 15|
|  1| Kira| 25|
|  2|  Tom| 14|
 --- ----- --- 
  • Related