Home > Blockchain >  AWS S3, get list of objects per suffix/month
AWS S3, get list of objects per suffix/month

Time:03-18

Is it possible to calculate the number of new objects that arrived to our S3 bucket per suffix/month based on Last Modified date? Assuming this is the content of our bucket:

Name            Last modified
pic01.jpg       2022-01-20
pic02.jpg       2022-01-22
doc01.pdf       2022-01-23
doc02.pdf       2022-01-25
doc03.pdf       2022-01-28
pic11.jpg       2022-02-01
pic12.jpg       2022-02-05
pic13.jpg       2022-02-10
doc11.pdf       2022-02-15

Desirable outbut:

Suffix      Month       Count
jpg         2022-01     2
pdf         2022-01     3
jpg         2022-02     3
pdf         2022-02     1

CodePudding user response:

This sounded like an interesting challenge, so I wrote this:

import boto3

suffixes = {}

BUCKET = 'BUCKETNAME'

s3_resource = boto3.resource('s3')

for object in s3_resource.Bucket(BUCKET).objects.all():
    key = object.key
    month = object.last_modified.strftime("%Y-%m")
    if '.' in key:
        suffix = key[key.rfind('.')   1:]
        suffixes[(month, suffix)] = suffixes.get((month, suffix), 0)   1

for key, value in sorted(suffixes.items()):
    print(key[1], key[0], value)

It will:

  • Loop through all objects in the bucket
  • If the Key contains a period, it will store the month and suffix in a dictionary
  • It then sorts the dictionary and prints the contents

The output is:

jpg 2014-03 1
yaml 2021-02 1
overlay 2021-03 1
txt 2021-06 1
py 2021-07 1
txt 2021-09 1
py 2021-10 1
jpg 2022-03 2
  • Related