Home > Net >  Why Web::Scraper does not parse script-tag?
Why Web::Scraper does not parse script-tag?

Time:09-17

I tried to scrape HTML-page with Web::Scraper, but surprisingly I did not get scripts from script-tags, as I expected.

Following example

use Web::Scraper;
use Data::Dumper;

my $html = q|
<html>
  <head>
    <title>test html</title>
  </head>
  <body>
    <script>
      test script
    </script>

    <p>
      p test
    </p>

    <other>
      other test
    </other>

  </body>
</html>
|;

our $scraper = scraper {
  process 'script', "script" => 'TEXT';
  process 'p', "p" => 'TEXT';
  process 'other', "other" => 'TEXT';
};

my $data = $scraper->scrape( $html );
say Dumper $data;

gives output

$VAR1 = {
          'other' => ' other test ',
          'p' => ' p test ',
          'script' => ''
        };

As a hack I can rename script-tags before scraping, but I'd like to understand why Web::Scraper does not give me content of inline scripts? Or what should I do differently?

CodePudding user response:

It works for me using XPath expression:

  process '//script/text()', "script" => 'TEXT';
  • Related