Home > Mobile >  boto3 upload_fileobj uploads only 1000 objects to a S3 location
boto3 upload_fileobj uploads only 1000 objects to a S3 location

Time:11-02

I have more than 1050 json files in a s3 location that contains a field 'id'. I am looping over these json files getting these id's using get_object. I will use these id's and pass along with a url to get another json response which contain a field that has snapshot location i.e, link to download file. I am capturing the downloaded object and writing to s3 location using s3_client.upload_fileobj(BytesIO(response.content), bucket_name, api_download_file_path file_name) all good but I am getting only 1000 csv files in the destination s3 location everytime I do it when I am expecting 1050. Is this due to any limit on upload_fileobj .

full code here

result = s3_client.list_objects(Bucket=bucket_name, Prefix=api_target_read_path)
for res in result.get('Contents'):
    data = s3_client.get_object(Bucket=bucket_name, Key=res.get('Key'))
    contents = data['Body'].read().decode('utf-8')
    json_data = json.loads(contents)
    print(json_data['id'])
    json_id = json_data['id']
    geturl = inv_avail_get_api_url   json_id
    response = requests.get(geturl, headers=headers)
    print(response.text)
    durl = response.json()["response"]["snapshotLocation"]
    response = requests.get(durl)
    segments = durl.rpartition('/')
    file_name = str(segments[2]).split('?')[0]
    print(file_name)
    s3_client.upload_fileobj(BytesIO(response.content), bucket_name, api_download_file_path   file_name)
python

CodePudding user response:

You need to use the paginator class if you are trying to get more than 1000 objects at a time, as per docs:

Some AWS operations return results that are incomplete and require subsequent requests in order to attain the entire result set. The process of sending subsequent requests to continue where a previous request left off is called pagination. For example, the list_objects operation of Amazon S3 returns up to 1000 objects at a time, and you must send subsequent requests with the appropriate Marker in order to retrieve the next page of results.

s3 = boto3.client('s3')

paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='prefix')

for page in pages:
    for obj in page['Contents']:
        print(obj['Size'])

  • Related