Home > other >  Grab latest AWS S3 Folder Object name with AWS CLI
Grab latest AWS S3 Folder Object name with AWS CLI

Time:02-05

I tried using this post to look for the last modified file then awk for the folder it's contained in: Get last modified object from S3 using AWS CLI

But this isn't ideal for over 1000 folders and by documentation, should be failing. I have 2000 folder objects I need to search through. My desired folder will always begin with an D and be followed by a set of incrementing numbers. Ex: D1200

The results from the answer led me to creating this call which works:

aws s3 ls main.test.staging/General_Testing/Results/ --recursive | sort | tail -n 1 | awk '{print $4}'

but it takes over 40 secs to search through thousands of files and I then need to regex parse the output to find the folder object and not the last file modified within it. Also, if I try to do this to find my desired folder (which is the object right after the Results object):

aws ls s3 main.test.staging/General_Testing/Results/ | sort | tail -1

Then my output will be D998 because the sort function will order folder names like this:

D119
D12
D13

Because technically D12 is greater than D119 because it has a 2 in the 2nd position. Following this strange logic, there's no way I can use that call to reliable retrieve the highest numbered folder and therefore the last one created. Something to note is that folder objects that contain files do not have a Last Modified tag that one can use to query.

To be clear of my question: What call can I use to look through a large amount of S3 objects to find the largest numbered folder object? Preferably the answer is fast, can work with 1000 objects, and won't require a regex breakdown.

CodePudding user response:

I wonder whether you can use a list of CommonPrefixes to overcome your program of having many folders?

Try this command:

aws s3api list-objects-v2 --bucket main.test.staging --delimiter '/' --prefix 'General_Testing/Results/' --query CommonPrefixes --output text

(Note that is uses s3api rather than s3.)

It should provide a list of 'folders'. I don't know whether it has a limit on the number of 'folders' returned.

As for sorting D119 before D2, this is because it is sorting strings. The output is perfectly correct when sorting strings.

To sort by the number portion, you can likely use "version sorting". See: How to sort strings that contain a common prefix and suffix numerically from Bash?

  •  Tags:  
  • Related