Read Nested JSON and Write to CSV-CodePudding

I am getting a nested JSON off of an API. When I try to convert the JSON into CSV, I get attribute errors for keys. How do I only write the records from the sample data to csv?

 {"result":{"total_records":3471,"offset":0.0,"size":100.0,"records":[{"hr_case_number":"HRC0177303","requested_by":"Test Emp 1","employee_name":"Test Emp2","employee_id":"99991","artifact_type":"Competency Assessment","artifact_subtype":"Transfer Competency Assessment","artifact_date":"2022-10-30","status":"uploaded"},{"hr_case_number":"HRC0177302","requested_by":"test emp 3","employee_name":"Test Emp 4","employee_id":"192499","artifact_type":"Orientation","artifact_subtype":"Acknowledgement of Mandated Reporter Status","artifact_date":"2022-10-28","status":"uploaded"}]}}

This is what I have so far:

import requests, json
import csv
url = "https://some.com/api?offset=0&size=10000"
headers = {
  'Authorization': 'Basic c3ZjX2RhdGF',
  'Cookie': 'BIGipServerpool_sometest=a728; 
JSESSIONID=26D6FA5703B691409AA3E44E6825C816; glide_user_route=glide.a50e06d87c4640335db5b2b40400f955; glide_session_store=D9F021AC1BE6D1103FB41F87B04BCB49',
      'Content-Type':'application/json','Accept':'application/json'
    }
    payload={}
    response = requests.request("GET", url, headers=headers, data=payload)
    with open('outputfile.json', 'wb') as outf:
         outf.write(response.content)
         outf.close()
    
    #Open JSON load the data into the variable data
    with open('C:\Python\outputfile.json', 'r') as json_file:
        data = json.load(json_file)
     
    result = data['result']
     
    # open a file for writing
    data_file = open('C:\Python\outputfile.csv', 'w', newline='')
     
    # create the csv writer
    csv_writer = csv.writer(data_file)
     
    # header
    
    count = 0
     
    for records in result:
        if count == 0:
     
            # Writing headers of CSV file
            header = records.keys()
            csv_writer.writerow(header)
            count  = 1
     
        # Writing data of CSV file
        csv_writer.writerow(records.values())
     
    data_file.close()

CodePudding user response：

result in your code is a dict, you shuold loop a list(result['records']) rather than dict

d =  {"result":{"total_records":3471,"offset":0.0,"size":100.0,"records":[{"hr_case_number":"HRC0177303","requested_by":"Test Emp 1","employee_name":"Test Emp2","employee_id":"99991","artifact_type":"Competency Assessment","artifact_subtype":"Transfer Competency Assessment","artifact_date":"2022-10-30","status":"uploaded"},{"hr_case_number":"HRC0177302","requested_by":"test emp 3","employee_name":"Test Emp 4","employee_id":"192499","artifact_type":"Orientation","artifact_subtype":"Acknowledgement of Mandated Reporter Status","artifact_date":"2022-10-28","status":"uploaded"}]}}
with open('C:\Python\outputfile.csv', 'w') as f:
    csv_writer = csv.writer(f)
    for idx, record in enumerate(d['result']['records']):
        if idx == 0:
            csv_writer.writerow(record.keys())
            
        csv_writer.writerow(record.values())

CodePudding user response：

records is a dict - the data you want is in data['result']['records']. Lists don't have a keys() method either. You could potentially use the first dict in that list, but then you'd also have to check whether there was any data at all.

Usually, when writing a CSV file, you know the columns in advance. It would be common to hard code the header which has the benefit that the writer will verify records for you (better to raise an error than write garbage data) and it will still work even if you have no records at all.

Use DictWriter because your records are dictionaries and supply the predefined header. The relevant part is

    my_header = ['artifact_date', 'artifact_subtype', 'artifact_type',
        'employee_id', 'employee_name', 'hr_case_number', 'requested_by',
        'status']

# open a file for writing
    with open('C:\Python\outputfile.csv', 'w', newline='') as data_file:
        writer = csv.DictWriter(data_file, my_header)
        writer.writeheader()
        writer.writerows(data['result']['records'])