I have a python script that goes out and pulls a huge chunk of JSON data and then iterates it to build 2 lists
# Get all price data
response = c.get_price_history_every_minute(symbol)
# Build prices list
prices = list()
for i in range (len(response.json()["candles"])):
prices.append (response.json()["candles"][i]["prices"])
# Build times list
times = list()
for i in range (len(response.json()["candles"])):
times.append (response.json()["candles"][i]["datetime"])
This works fine, but it takes a LONG time to pull in all of the data and build the lists. I am doing some testing trying to build out a complex script, and would like to save these two lists to two files, and then import the data from those files and recreate the lists when I run subsequent tests to skip generating, iterating and parsing the JSON.
I have been trying the following:
# Write Price to a File
a_file = open("prices7.txt", "w")
content = str(prices)
a_file.write(content)
a_file.close()
And then in future scripts:
# Load Prices from File
prices_test = array('d')
a_file = open("prices7.txt", "r")
prices_test = a_file.read()
The outputs from my json lists and the data loaded into the list created from the file output look identical, but when I try to do anything with the data loaded from a file it is garbage...
print (prices)
{The output looks like this} [69.73, 69.72, 69.64, ... 69.85, 69.82, etc]
print (prices_test)
The output looks identical
If I run a simple query like:
print (prices[1], prices[2])
I get the expected output {69.73, 69.72]
If I do the same on the list created from the file:
print (prices_test[1], prices_test[2])
I get the output ( [,6 )
It is pulling every character in the string individually instead of using the comma separated values as I would have expected...
I've googled every combination of search terms I could think of so any help would be GREATLY appreciated!!
CodePudding user response:
I had to do something like this before. I used pickle to do it.
import pickle
def pickle_the_data(pickle_name, list_to_pickle):
"""This function pickles a given list.
Args:
pickle_name (str): name of the resulting pickle.
list_to_pickle (list): list that you need to pickle
"""
with open(pickle_name '.pickle', 'wb') as pikd:
pickle.dump(list_to_pickle, pikd)
file_name = pickle_name '.pickle'
print(f'{file_name}: Created.')
def unpickle_the_data(pickle_file_name):
"""This will unpickle a pickled file
Args:
pickle_file_name (str): file name of the pickle
Returns:
list: when we pass a pickled list, it will return an
unpickled list.
"""
with open(pickle_file_name, 'rb') as pk_file:
unpickleddata = pickle.load(pk_file)
return unpickleddata
so first pickle your list pickle_the_data(name_for_pickle, your_list)
then when you need to load the list unpickle_the_data(name_of_your_pickle_file)
CodePudding user response:
This is what I'm trying to explain into the comments section. Note I replaced response.json()
to jsonData
, successfully taking it out of each for-loop, and reduced both loops into a single one for more efficiency. Now the code should run faster.
import json
def saveData(filename, data):
# Convert Data to a JSON String
data = json.dumps(data)
# Open the file, then save it
try:
file = open(filename, "wt")
except:
print("Failed to save the file.")
return False
else:
file.write(data)
file.close()
return True
def loadData(filename):
# Open the file, then load its contents
try:
file = open(filename, "wt")
except:
print("Failed to load the file.")
return None
else:
data = file.read()
file.close()
# Data is a JSON string, so now we convert it back
# to a Python Structure:
data = json.loads(data)
return data
# Get all price data
response = c.get_price_history_every_minute(symbol)
jsonData = response.json()
# Build prices and times list:
#
# As you're iterating over the same "candles" index on both loops
# when building those two lists, just reduce it to a single loop
prices = list()
times = list()
for i in range(len(jsonData["candles"])):
prices.append(jsonData["candles"][i]["prices"])
times.append(jsonData["candles"][i]["datetime"])
# Now, when you need, just save each list like this:
saveData("prices_list.json", prices)
saveData("times_list.json", times)
# And retrieve them back when you need it later:
prices = loadData("prices_list.json")
times = loadData("times_list.json")
Btw, pickle does the same thing, but it uses Binary Data instead of json, which is probably faster for save / load data. I don't know, didn't tested it.
In json, you have the advantage of readability, as you can open each file and read it directly, if you can understand JSON syntax.