I have a data frame in Databricks with the following data columns
OBJECTID, SingleLine
1234, sample Address
I want to create a JSON output file per the below specification using PySpark or Python / Scala on Databricks Platform How can I do that ?
{
"records": [
{
"attributes": {
"OBJECTID": 1,
"Address": "380 New York St"
}
},
{
"attributes": {
"OBJECTID": 2,
"Address": "1 World Way"
}
}
]
Please help ... Thanks very much
CodePudding user response:
If you looking to get all data in above format you can use query below and write it to a file output
from pyspark.sql import functions as F
dfp=spark.createDataFrame([(1,"abc"),(2,"def")],"OBJECTID int, Address string")
dfp.selectExpr(" to_json(map('records',collect_list(map('attriburtes',map('OBJECTID', OBJECTID,'Address', Address))))) as json_output ")
#output
---------------------------------------------------------------------------------------------------------------
|json_output |
---------------------------------------------------------------------------------------------------------------
|{"records":[{"attriburtes":{"OBJECTID":"1","Address":"abc"}},{"attriburtes":{"OBJECTID":"2","Address":"def"}}]}|
---------------------------------------------------------------------------------------------------------------