I tried using this post to look for the last modified file then awk for the folder it's contained in: Get last modified object from S3 using AWS CLI
But this isn't ideal for over 1000 folders and by documentation, should be failing. I have 2000 folder objects I need to search through. My desired folder will always begin with an D and be followed by a set of incrementing numbers. Ex: D1200
The results from the answer led me to creating this call which works:
aws s3 ls main.test.staging/General_Testing/Results/ --recursive | sort | tail -n 1 | awk '{print $4}'
but it takes over 40 secs to search through thousands of files and I then need to regex parse the output to find the folder object and not the last file modified within it. Also, if I try to do this to find my desired folder (which is the object right after the Results
object):
aws ls s3 main.test.staging/General_Testing/Results/ | sort | tail -1
Then my output will be D998
because the sort function will order folder names like this:
D119
D12
D13
Because technically D12
is greater than D119
because it has a 2
in the 2nd position. Following this strange logic, there's no way I can use that call to reliable retrieve the highest numbered folder and therefore the last one created. Something to note is that folder objects that contain files do not have a Last Modified
tag that one can use to query.
To be clear of my question: What call can I use to look through a large amount of S3 objects to find the largest numbered folder object? Preferably the answer is fast, can work with 1000 objects, and won't require a regex breakdown.
CodePudding user response:
I wonder whether you can use a list of CommonPrefixes
to overcome your program of having many folders?
Try this command:
aws s3api list-objects-v2 --bucket main.test.staging --delimiter '/' --prefix 'General_Testing/Results/' --query CommonPrefixes --output text
(Note that is uses s3api
rather than s3
.)
It should provide a list of 'folders'. I don't know whether it has a limit on the number of 'folders' returned.
As for sorting D119
before D2
, this is because it is sorting strings. The output is perfectly correct when sorting strings.
To sort by the number portion, you can likely use "version sorting". See: How to sort strings that contain a common prefix and suffix numerically from Bash?