I have an S3 bucket with nearly duplicate files:
If I run the AWS CLI, I get the same file paths, differing only by a few bytes:
2021-09-23 16:36:36 134626 Original/53866358.xml
2021-09-23 16:36:36 134675 Original/53866358.xml
If I look at the individual object pages, both have the same key:
The only difference is that one has
(ASCII carriage return) at the end of its Object URL. Presumably, this is the larger file. My question is: How can I get a unique reference to each of these using the AWS S3 CLI? I'd like to delete the ones with the carriage-return at the end.
CodePudding user response:
This is an interesting problem, just to lay the ground work of how my solution will help, I recreated the issue with a simple python script:
import boto3
s3 = boto3.client('s3')
s3.put_object(Bucket='example-bucket', Key='temp/key', Body=b'normal key')
s3.put_object(Bucket='example-bucket', Key='temp/key\r', Body=b'this is not the normal key')
From there, you can see the issue as you describe:
$ aws s3 ls s3://example-bucket/temp/
2021-12-03 20:14:45 10 key
2021-12-03 20:14:45 26 key
You can list the objects with more details using the cli (some details have been removed from the output here):
$ aws s3api list-objects --bucket example-bucket --prefix temp/
{
"Contents": [
{
"Key": "temp/key",
"Size": 10
},
{
"Key": "temp/key\r",
"Size": 26
}
]
}
To remove the object with the CR in the key name, a script would be easiest, but you can delete it with the CLI, just with a somewhat awkward syntax:
## If you're using Unix or Mac
$ aws s3api delete-object --cli-input-json '{"Bucket": "example-bucket", "Key": "temp/key\r"}'
## If you're using Windows:
C:> aws s3api delete-object --cli-input-json "{""Bucket"": ""example-bucket"", ""
Key"": ""temp/key\r""}"
Note that required syntax to quote the JSON object, and escape the quotes on Windows.
From there, it's simple to verify this worked as expected:
$ aws s3 ls s3://example-bucket/temp/
2021-12-03 20:14:45 10 key
$ aws s3 cp s3://example-bucket/temp/key final_check.txt
download: s3://example-bucket/temp/key to ./final_check.txt
$ type final_check.txt
normal key