I would like to save and load several arrays to/from a file that is stored at a URL.
Here is how I have save the file:
import numpy as np
x=np.array([1,2,3])
y=np.array([4,5,6])
np.savez('./Test.npz',x=x,y=y)
Then I can successfully load the data from the local directory:
data=np.load('./Test.npz', allow_pickle=True)
print(data['x'],data['y'])
Here's how I try to load it from a URL that points to the same file:
ds=np.DataSource()
DataUrl='https://www.dropbox.com/s/1vpn5k3gt41nhtn/Test.npz'
DataFile = ds.open(DataUrl)
data = np.load(DataFile, allow_pickle=True)
I have also tried:
!wget -nc 'https://www.dropbox.com/s/lm5ejwf7wzo1e58/SpikeCounts112Neuron12Thetas.npz'
np.load(DataFile, allow_pickle=True)
In both cases, I get the following error:
----> 3 np.load(DataFile, allow_pickle=True)
/opt/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
437 # If the file size is less than N, we need to make sure not
438 # to seek past the beginning of the file
--> 439 fid.seek(-min(N, len(magic)), 1) # back-up
440 if magic.startswith(_ZIP_PREFIX) or magic.startswith(_ZIP_SUFFIX):
441 # zip-file (assume .npz)
UnsupportedOperation: can't do nonzero cur-relative seeks
What am I doing wrong? What is a reasonable way to load multiple NumPy arrays from a single URL?
CodePudding user response:
I think you just need to give np.load
the filename, not the open DataSource
object. This seems to work:
import numpy as np
url = "https://www.dropbox.com/s/1vpn5k3gt41nhtn/Test.npz"
file = np.DataSource().open(url)
data = np.load(file.name)
Now data['x']
is array([1, 2, 3])
and data['y']
is array([4, 5, 6])
.
By the way, I learned something. I thought that to get a nice plain file out of Dropbox you had to stick ?raw=1
at the end of the URL. Turns out that's not true.
Last thing, kudos for setting up your question and example so nicely. Hardly anyone does that.