Perl 5.36.0 with latest WWW::Mechanize 2.15. I want to get https://web.metro.taipei/img/ALL/timetables/079a.PDF this PDF file's Last-Modified
, in which both curl & HTTPie work well:
$ curl -i https://web.metro.taipei/img/ALL/timetables/079a.PDF
HTTP/1.1 200 OK
Content-Type: application/pdf
Last-Modified: Fri, 11 Nov 2022 16:20:50 GMT
Accept-Ranges: bytes
ETag: "93931790e9f5d81:0"
Date: Wed, 23 Nov 2022 05:24:16 GMT
Content-Length: 205866
Strict-Transport-Security: max-age=177211688
Set-Cookie: ...
$ http -p h https://web.metro.taipei/img/ALL/timetables/079a.PDF
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 205866
Content-Type: application/pdf
Date: Wed, 23 Nov 2022 05:24:52 GMT
ETag: "93931790e9f5d81:0"
Last-Modified: Fri, 11 Nov 2022 16:20:50 GMT
Set-Cookie: ...
Strict-Transport-Security: max-age=177211688
But with Perl WWW::Mechanize (based on LWP::UserAgent), it doesn't return the PDF file and instead the server has redirected me to root page https://web.metro.taipei/:
(btw, I need to workaround its SSL certificate verification issue due to lack of intermediate certificate on https://web.metro.taipei/ web server settings. The sectigo-rsa.pem
file can be obtained from https://crt.sh/?d=924467857)
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
use Data::Dumper;
use JSON;
use WWW::Mechanize;
INIT {
# Workaround with Sectigo's intermediate CA.
my $ua = WWW::Mechanize->new(
agent => 'Monitoring/0.20221123',
ssl_opts => {
SSL_ca_file => 'sectigo-rsa.pem'
},
);
my $res = $ua->get(
'https://web.metro.taipei/img/ALL/timetables/079a.PDF',
);
say $res->base;
# You can see details of redirects with:
say Dumper $res->redirects;
}
__END__
Now $res->base
is:
$ ./monitoring-taipei-metro.pl
https://www.metro.taipei/
Also for the result of $res->headers
, you can see there is a redirect response about https://www.metro.taipei/
:
$VAR1 = bless( {
'_content' => '',
'_rc' => '302',
'_headers' => bless( {
'content-length' => '0',
'client-peer' => '60.244.85.177:443',
'location' => 'https://www.metro.taipei/',
CodePudding user response:
By capturing WWW:Mechanize
's raw request (using mitmproxy
), I found these headers in the request:
TE: deflate,gzip;q=0.3
Connection: close, TE
It seems that web.metro.taipei
will redirect all requests to https://www.metro.taipei/ if Connection: TE
exists.
push(@LWP::Protocol::http::EXTRA_SOCK_OPTS, SendTE => 0);
This turns of sending TE header.