Home > Blockchain >  Append values from one column to another JSON column in the same dataframe
Append values from one column to another JSON column in the same dataframe

Time:01-04

I have some data in dataframe which looks like this:

 ----------- -------- ----------- -------------------------------- 
|       Noun| Pronoun|  Adjective|Metadata                        |
 ----------- -------- ----------- -------------------------------- 
|      Homer| Simpson|Engineer   |{"Age": "50", "Country": "USA"} | 
|      Elon | Musk   |King       |{"Age": "45", "Country": "RSA"} |
|      Bart | Lee    |Cricketer  |{"Age": "35", "Country": "AUS"} |
|      Lisa | Jobs   |Daughter   |{"Age": "35", "Country": "IND"} |
|      Joe  | Root   |Player     |{"Age": "31", "Country": "ENG"} |
 ----------- -------- ----------- -------------------------------- 

I want to append values from another column(say Adjective) into the Metadata column. So that the final dataframe would look like this:

 ----------- -------- ----------- ------------------------------------------------------------ 
|       Noun| Pronoun|  Adjective|Metadata                                                    |
 ----------- -------- ----------- ------------------------------------------------------------ 
|      Homer| Simpson|Engineer   |{"Age": "50", "Country": "USA", "Adjective": "Engineer"}    | 
|      Elon | Musk   |King       |{"Age": "45", "Country": "RSA", "Adjective": "King"}        |
|      Bart | Lee    |Cricketer  |{"Age": "35", "Country": "AUS", "Adjective": "Cricketer"}   |
|      Lisa | Jobs   |Daughter   |{"Age": "35", "Country": "IND", "Adjective": "Daughter"}    |
|      Joe  | Root   |Player     |{"Age": "31", "Country": "ENG", "Adjective": "Player"}      |
 ----------- -------- ----------- ------------------------------------------------------------ 

Please suggest how this can be implemented.

CodePudding user response:

Assuming your column Metadata contains JSON strings, you can first convert it to MapType with from_json function, then add the columns you want using map_concat and finally convert again to JSON string using to_json:

val df2 = df.withColumn(
    "Metadata",
    from_json(col("Metadata"), lit("map<string,string>"))
).withColumn(
    "Metadata",
    to_json(map_concat(col("Metadata"), map(lit("Adjective"), col("Adjective"))))
)

df2.show(false)

// ----- ------- --------- ---------------------------------------------------- 
//|Noun |Pronoun|Adjective|Metadata                                            |
// ----- ------- --------- ---------------------------------------------------- 
//|Homer|Simpson|Engineer |{"Age":"50","Country":"USA","Adjective":"Engineer"} |
//|Elon |Musk   |King     |{"Age":"45","Country":"RSA","Adjective":"King"}     |
//|Bart |Lee    |Cricketer|{"Age":"35","Country":"AUS","Adjective":"Cricketer"}|
//|Lisa |Jobs   |Daughter |{"Age":"35","Country":"IND","Adjective":"Daughter"} |
//|Joe  |Root   |Player   |{"Age":"31","Country":"ENG","Adjective":"Player"}   |
// ----- ------- --------- ---------------------------------------------------- 

This also can be done using a conversion to StructType instead of MapType but map is more generic in this case.

  • Related