Converting Dataframe to json format with root level-CodePudding

I have a data frame in Databricks with the following data columns

OBJECTID, SingleLine

1234, sample Address

I want to create a JSON output file per the below specification using PySpark or Python / Scala on Databricks Platform How can I do that ?

{

"records": [

    {

        "attributes": {

            "OBJECTID": 1,

            "Address": "380 New York St"

        }

    },

    {

        "attributes": {

            "OBJECTID": 2,

            "Address": "1 World Way"

        }

    }

]

Please help ... Thanks very much

CodePudding user response：

If you looking to get all data in above format you can use query below and write it to a file output

from pyspark.sql import functions as F
dfp=spark.createDataFrame([(1,"abc"),(2,"def")],"OBJECTID int, Address string")
dfp.selectExpr(" to_json(map('records',collect_list(map('attriburtes',map('OBJECTID', OBJECTID,'Address', Address))))) as json_output ")

#output
 --------------------------------------------------------------------------------------------------------------- 
|json_output                                                                                                    |
 --------------------------------------------------------------------------------------------------------------- 
|{"records":[{"attriburtes":{"OBJECTID":"1","Address":"abc"}},{"attriburtes":{"OBJECTID":"2","Address":"def"}}]}|
 ---------------------------------------------------------------------------------------------------------------