Home > database >  How to convert JSON data inside a spark dataframe into new columns
How to convert JSON data inside a spark dataframe into new columns

Time:03-11

I have a spark dataframe like

key       value
1111    {'fruit1': 'apple', 'fruit2': 'mango'}
2222    {'fruit1': 'orange'}

I want to convert the JSON (string) to new columns

key       fruit1    fruit2
1111      apple       mango
2222      orange      null

I don't want to manually specify the keys from JSON as there are more than 100 keys

CodePudding user response:

json_keys = df.selectExpr('explode(json_object_keys(value))').select(F.collect_set('col')).first()[0]
df.select('key', F.json_tuple('value', *json_keys).alias(*json_keys)).show()

 ---- ------ ------ 
| key|fruit1|fruit2|
 ---- ------ ------ 
|1111| apple| mango|
|2222|orange|  null|
 ---- ------ ------ 
  • Related