I am extracting JSON data from a API and trying to write on Azure container path. I am able to display data correctly in notebook, but when i write JSON most of the values are NULL. Any help on where i am going wrong?
headers = {
"accept" : "application/json",
"Content-Type": "application/json",
"Authorization": "Bearer " str(token)
}
response_get= requests.get(getURL, headers=headers)
response_final=response_get.json()
print("Type:", type(response_final))
data = json_normalize(response_final)
df = spark.createDataFrame(data)
##df.coalesce(1).write.parquet(stagingpath,mode='overwrite')
df.coalesce(1).write.json(stagingpath,mode='overwrite')
CodePudding user response:
I have reproduced in my environment and followed below process and got expected results as below and followed Microsoft-Document and SO-Thread:
import requests
response = requests.get('https://reqres.in/api/users?page=3')
rdd = spark.sparkContext.parallelize([response.text])
df = spark.read.json(rdd)
df.show()
dbutils.fs.mount( source = "wasbs://[email protected]", mount_point = "/mnt/mymountpoint", extra_configs = {"fs.azure.sas.mycontainer.myblobstorageaccount.blob.core.windows.net": "SAS"})
The run below script to write json:
df.coalesce(1).write.json( "/mnt/mymountpoint/vamo.json")
Output:
Click on folder Vammo.json:
Click on part-00xxx:
Then Click on View/Edit: