I'm uploading a large file (about 2GB) to an API that accepts POST method using requests
module of Python, which results in loading the file to the memory first and increasing memory usage significantly. I believe there will be some other ways to stream the file to the API without burdening the memory. Any suggestions?
P.S.
This old way worked for me, but consumed too much memory.
file = {'file': open(path, 'rb')}
requests.post(url, files = file)
Below streaming way sees no memory gorged but returns code 400 from the server.
requests.post(url,data=open(path, 'rb'))
CodePudding user response:
Any suggestions?
Use Streaming Upload, as docs put it:
Requests supports streaming uploads, which allow you to send large streams or files without reading them into memory. To stream and upload, simply provide a file-like object for your body:
with open('massive-body', 'rb') as f: requests.post('http://some.url/streamed', data=f)
CodePudding user response:
When you pass files
arg then requests lib makes a multipart form upload. i.e. it is like submitting a form, where the file is passed as a named field (file
in your example)
I suspect the problem you saw is because when you pass a file object as data
arg, as suggested in the docs here https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads then it does a streaming upload but the file content is used as the whole http post body.
So I think the server at the other end is expecting a form with a file
field, but we're just sending the binary content of the file by itself.
What we need is some way to wrap the content of the file with the right "envelope" as we send it to the server, so that it can recognise the data we are sending.
See this issue where others have noted the same problem: https://github.com/psf/requests/issues/1584
I think the best suggestion from there is to use this additional lib, which provides streaming multipart form file upload: https://github.com/requests/toolbelt#multipartform-data-encoder
For example:
from requests_toolbelt import MultipartEncoder
import requests
encoder = MultipartEncoder(
fields={'file': ('myfilename.xyz', open(path, 'rb'), 'text/plain')}
)
response = requests.post(
url, data=encoder, headers={'Content-Type': encoder.content_type}
)