Is it possible to calculate the number of new objects that arrived to our S3 bucket per suffix/month based on Last Modified date? Assuming this is the content of our bucket:
Name Last modified
pic01.jpg 2022-01-20
pic02.jpg 2022-01-22
doc01.pdf 2022-01-23
doc02.pdf 2022-01-25
doc03.pdf 2022-01-28
pic11.jpg 2022-02-01
pic12.jpg 2022-02-05
pic13.jpg 2022-02-10
doc11.pdf 2022-02-15
Desirable outbut:
Suffix Month Count
jpg 2022-01 2
pdf 2022-01 3
jpg 2022-02 3
pdf 2022-02 1
CodePudding user response:
This sounded like an interesting challenge, so I wrote this:
import boto3
suffixes = {}
BUCKET = 'BUCKETNAME'
s3_resource = boto3.resource('s3')
for object in s3_resource.Bucket(BUCKET).objects.all():
key = object.key
month = object.last_modified.strftime("%Y-%m")
if '.' in key:
suffix = key[key.rfind('.') 1:]
suffixes[(month, suffix)] = suffixes.get((month, suffix), 0) 1
for key, value in sorted(suffixes.items()):
print(key[1], key[0], value)
It will:
- Loop through all objects in the bucket
- If the Key contains a period, it will store the month and suffix in a dictionary
- It then sorts the dictionary and prints the contents
The output is:
jpg 2014-03 1
yaml 2021-02 1
overlay 2021-03 1
txt 2021-06 1
py 2021-07 1
txt 2021-09 1
py 2021-10 1
jpg 2022-03 2