I've iterated through JSON objects returned to me via an API query, and can get the data (obj_key1) into my target_csv
, but the data is showing up in the first column, not the data
column, which I am trying to specify.
Example JSON:
[{'obj_key': 0.5,
'obj_key1': 4000,
'obj_key2': '2020-05-01',
'obj_key3': 0.0,
'obj_key4': 'US',
'obj_key5': 50000,
'obj_key6': 11,
'obj_key7': 0.0,
'obj_key8': 20000,
'obj_key9': 0.01,
'obj_key10': '6786547',
'obj_key11': None,
'obj_key12': 12},
{'obj_key': 0.4,
'obj_key1': 5000,
'obj_key2': '2020-05-02',
'obj_key3': 0.0,
'obj_key4': 'US',
'obj_key5': 657435,
'obj_key6': 112,
'obj_key7': 0.0,
'obj_key8': 678548,
'obj_key9': 0.032,
'obj_key10': '6786547',
'obj_key11': None,
'obj_key12': 13},...]
Code:
import csv
import json
import requests
token = {token}
with open('../folder/file.csv') as src, open('target_file.csv', 'w', newline='') as tgt:
reader = csv.reader(src)
writer = csv.writer(tgt)
columns = next(reader)
writer.writerow(columns)
headers = {
'Authorization': token,
'Content-Type': 'application/json; charset=utf-8',
}
for id, name, date, data in reader:
response = requests.get(f'https://{base_url}/api/endpoint?start_date=2021-05-01&end_date=2021-05-15' '&id=' str(id),
headers=headers
)
response_data = response.json()
for i in range(len(response_data)):
data = writer.writerows(map(lambda x: [response_data[i]['obj_key1']], range(1)))
Expected CSV Output:
id | name | date | data |
---|---|---|---|
6786547 | Name | 2020-05-01 | 4000 |
6786547 | Name | 2020-05-02 | 5000 |
CodePudding user response:
I'm going recommend something that looks very different, but should get the job done all the same.
The big idea for me is to split the process up into distinct chunks—it'll be much easier for everyone to understand.
Read first
I've chosen to save all the API reads into a list of dicts that will be written-out with the CSV DictWriter in the next part.
import csv
import requests
path = f'https://{base_url}/api/endpoint?start_date=2021-05-01&end_date=2021-05-15'
token = {token}
headers = {
'Authorization': token,
'Content-Type': 'application/json; charset=utf-8',
}
final_rows = []
with open('../folder/file.csv') as src:
reader = csv.reader(src)
next(reader) # discard this header, will be created from dict row
for my_id, name, date in reader: # omitted `data` from input
response = requests.get(path f'&id={my_id}', headers=headers)
# Assuming `response.json()` looks like the sample JSON you provided
for json_data in response.json():
final_row = {
'id': my_id,
'name': name,
'date': date,
'data': json_data['obj_key1']
}
final_rows.append(final_row)
I mocked up final_rows
, which looks like:
final_rows = [
{
'id': '6786547',
'name': 'Name',
'date': '2020-05-01',
'data': '4000'
},
{
'id': '6786547',
'name': 'Name',
'date': '2020-05-02',
'data': '5000'
}
]
Then, write
Now, final_rows
will almost just write itself:
with open('target_file.csv', 'w', newline='') as tgt:
writer = csv.DictWriter(tgt, fieldnames=final_rows[0].keys())
writer.writeheader()
writer.writerows(final_rows)
and I get:
id,name,date,data
6786547,Name,2020-05-01,4000
6786547,Name,2020-05-02,5000
CodePudding user response:
Here's how to implement what I suggested in my comment about calling writerow()
multiple times and passing it all four columns (id
, name
, date
, and data
). You might be able to optimize it to create the rows first and then make one call to writerows()
instead, but it's doubtful to me whether it would be worth the effort.
Note it seems odd to me that you're ignoring the date
in 'obj_key2'
of each object of the list in the response and using the one in CSV file being read — but that seems to be what you said in your question.
import csv
import json
import requests
token = {token}
with open('../folder/file.csv', 'r', newline='') as src, \
open('target_file.csv', 'w', newline='') as tgt:
reader = csv.reader(src)
writer = csv.writer(tgt)
writer.writerow(next(reader) ['data']) # Copy header row and add column.
headers = {'Authorization': token,
'Content-Type': 'application/json; charset=utf-8'}
for id, name, date, data in reader:
response = requests.get(
f'https://{base_url}/api/endpoint?start_date=2021-05-01&end_date=2021-05-15'
f'&id={id}', headers=headers)
for obj in response.json():
writer.writerow([id, name, date, obj['obj_key1']])