Home > Mobile >  Spark dataframe write to JSON showing NULL ourput
Spark dataframe write to JSON showing NULL ourput

Time:01-14

I am extracting JSON data from a API and trying to write on Azure container path. I am able to display data correctly in notebook, but when i write JSON most of the values are NULL. Any help on where i am going wrong?

headers = {
"accept" : "application/json",
"Content-Type": "application/json",
 "Authorization": "Bearer "   str(token)
}

 response_get= requests.get(getURL, headers=headers)
 response_final=response_get.json()
 print("Type:", type(response_final))
 data = json_normalize(response_final)
 df = spark.createDataFrame(data)
 ##df.coalesce(1).write.parquet(stagingpath,mode='overwrite')
 df.coalesce(1).write.json(stagingpath,mode='overwrite')

CodePudding user response:

I have reproduced in my environment and followed below process and got expected results as below and followed Microsoft-Document and SO-Thread:

import requests

response = requests.get('https://reqres.in/api/users?page=3')
rdd = spark.sparkContext.parallelize([response.text])
df = spark.read.json(rdd)
df.show()
dbutils.fs.mount( source = "wasbs://[email protected]", mount_point = "/mnt/mymountpoint", extra_configs = {"fs.azure.sas.mycontainer.myblobstorageaccount.blob.core.windows.net": "SAS"})

enter image description here

The run below script to write json:

df.coalesce(1).write.json( "/mnt/mymountpoint/vamo.json")

enter image description here

Output:

Click on folder Vammo.json:

enter image description here

Click on part-00xxx:

enter image description here

Then Click on View/Edit:

enter image description here

  • Related