Home > Software engineering >  Strange redirection with Perl's WWW::Mechanize
Strange redirection with Perl's WWW::Mechanize

Time:11-24

Perl 5.36.0 with latest WWW::Mechanize 2.15. I want to get https://web.metro.taipei/img/ALL/timetables/079a.PDF this PDF file's Last-Modified, in which both curl & HTTPie work well:

$ curl -i https://web.metro.taipei/img/ALL/timetables/079a.PDF
HTTP/1.1 200 OK
Content-Type: application/pdf
Last-Modified: Fri, 11 Nov 2022 16:20:50 GMT
Accept-Ranges: bytes
ETag: "93931790e9f5d81:0"
Date: Wed, 23 Nov 2022 05:24:16 GMT
Content-Length: 205866
Strict-Transport-Security: max-age=177211688
Set-Cookie: ...

$ http -p h https://web.metro.taipei/img/ALL/timetables/079a.PDF 
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 205866
Content-Type: application/pdf
Date: Wed, 23 Nov 2022 05:24:52 GMT
ETag: "93931790e9f5d81:0"
Last-Modified: Fri, 11 Nov 2022 16:20:50 GMT
Set-Cookie: ...
Strict-Transport-Security: max-age=177211688

But with Perl WWW::Mechanize (based on LWP::UserAgent), it doesn't return the PDF file and instead the server has redirected me to root page https://web.metro.taipei/:

(btw, I need to workaround its SSL certificate verification issue due to lack of intermediate certificate on https://web.metro.taipei/ web server settings. The sectigo-rsa.pem file can be obtained from https://crt.sh/?d=924467857)

#!/usr/bin/env perl

use 5.010;
use strict;
use warnings;

use Data::Dumper;
use JSON;
use WWW::Mechanize;

INIT {
    # Workaround with Sectigo's intermediate CA.
    my $ua = WWW::Mechanize->new(
        agent => 'Monitoring/0.20221123',
        ssl_opts => {
            SSL_ca_file => 'sectigo-rsa.pem'
        },
    );

    my $res = $ua->get(
        'https://web.metro.taipei/img/ALL/timetables/079a.PDF',
    );

    say $res->base;

    # You can see details of redirects with:
    say Dumper $res->redirects;
}

__END__

Now $res->base is:

$ ./monitoring-taipei-metro.pl
https://www.metro.taipei/

Also for the result of $res->headers, you can see there is a redirect response about https://www.metro.taipei/:

$VAR1 = bless( {                                                                                  
                 '_content' => '',                                                                
                 '_rc' => '302',                                                                  
                 '_headers' => bless( {                                                           
                                        'content-length' => '0',                                  
                                        'client-peer' => '60.244.85.177:443',                     
                                        'location' => 'https://www.metro.taipei/',

CodePudding user response:

By capturing WWW:Mechanize's raw request (using mitmproxy), I found these headers in the request:

TE:               deflate,gzip;q=0.3
Connection:       close, TE

It seems that web.metro.taipei will redirect all requests to https://www.metro.taipei/ if Connection: TE exists.

push(@LWP::Protocol::http::EXTRA_SOCK_OPTS, SendTE => 0);

This turns of sending TE header.

  •  Tags:  
  • perl
  • Related