I have n batches, each containing 100 API requests. Batch 1 contains files [1-100], batch 2 contains files [101-200] etc...
I want to dump each of these into a json file. This is fine.. However, I want to dump 100k json responses into 1 file, then create a new file and dump the next 100k observations into another file.
I need to configure a function which creates a file based on batch number, I have tried the following:
def open_file(self, batch):
if batch % 1000 == 0:
filename = f"data_{batch}.json"
else:
filename = ""
f = open(filename, "a")
return f
If batch % 1000 == 0, then I want to change the name (as batch number 1000 -> 1000 batches of 100 json requests = 100k in total). However, this clearly does not work as when I evaluate batch 1001, the old file opens again. How can I create one file for batch 1-999, another file for batch 1000-1999, then for 2000-2999....?
Thanks
Edit: additional information
def fetch_data(self, sequence, batch=1):
# fetch event list (event list size = 100)
event_list = self.events(sequence)
# open files to store data
f = self.open_file(batch)
# opening thread pool executor for multi-threading
with ThreadPoolExecutor(max_workers=4) as executor:
# self.thread_event_list multi-threads each event in event list by sending
# and API request.
for i, response in enumerate(executor.map(self.thread_event_list, event_list), 1):
json.dump(response.json(), f)
f.write("\n")
# continue from last sequence number (used in the recursive call)
last_sequence = response.json()["last_sequence"]
# Recursive call, use last sequence number of the
# event list, and continue with sequence 1 and batch 1
self.fetch_data(
sequence=last_sequence 1,
batch=batch 1
)
CodePudding user response:
import math
# Be sure that batch is never 0, otherwise this will create a file for batch #0 only.
def open_file(self, batch):
# If you're unsure, this handles it for you.
if batch < 1:
batch = 1
filename = f"data_{math.ceil(batch/1000)}.json"
f = open(filename, "a")
return f
CodePudding user response:
Why not just use a range function? `
if batch in range(1,1000):
filename = f"data_{batch_name1}.json"
elif batch in range(1000,2000):
filename = f"data_{batch_name2}.json"
elif batch in range(2000,3000):
filename = f"data_{batch_name3}.json"
`