Home > Software design >  Downloading text presigned s3 URL with wget returns binary
Downloading text presigned s3 URL with wget returns binary

Time:08-02

I'm trying to programmatically download a pre-signed S3 URL. I know that the file I'm downloading is an ASCII-text file. When downloading the URL by copy-paste into Chrome, the file is indeed as I would expect (see below). However, with wget the downloaded file is binary.

Looking into previous posts about this, unfortunately I couldn't find much that helped me. The posts suggest to add quotes around the URL, but my URL does not contain special characters. Some of the posts I checked: Browser showing HTML file

But attempting to download the file from WGet shows the compressed contents:

$ wget -qO- https://example-bucket.s3.amazonaws.com/example_html_br.html | hexdump -C
00000000  1f 6e 00 00 1d 07 ee be  1d 1b 46 77 12 aa 15 78  |.n........Fw...x|
00000010  a8 dc d4 d4 5b 83 cc a0  a5 81 96 1c b0 b7 d5 6d  |....[..........m|
00000020  29 46 f6 fa 6e 63 eb 29  ea aa 82 c8 25 a8 42 91  |)F..nc.)....%.B.|
00000030  ce 1d 07 f6 06 e1 52 0f  f4 4a a9 d6 87 17 76 ff  |......R..J....v.|
00000040  e1 da 01                                          |...|

You can verify this by looking at the HTTP headers:

$ wget -S https://example-bucket.s3.amazonaws.com/example_html_br.html
--2022-08-01 14:10:40--  https://example-bucket.s3.amazonaws.com/example_html_br.html
Resolving example-bucket.s3.amazonaws.com (example-bucket.s3.amazonaws.com)... 52.218.178.75
  [...]
  HTTP/1.1 200 OK
  Content-Encoding: br

Here showing the content-encoding that the browser triggers off of. Either you'll need to ensure that whatever component that places this content in S3 in the first place doesn't compress it, or if you want to download the content, then you'll need to decompress it as the browser does:

wget -qO- https://example-bucket.s3.amazonaws.com/example_html_br.html | brotli -df
<html>
<head>
<title>Example</title>
[...]

The same premise holds true if you're using pre-signed URLs.

  • Related