Home > Enterprise >  Scrape in and Search with Google App Script
Scrape in and Search with Google App Script

Time:04-09

I've really tried everything, but I can't get anywhere. I would like to make a very simple evaluation of whether a term occurs in the source text of a page. I would like to know whether the term "In Stock." is contained on the page.

This is my code. Very simple.

function productTitle(url) {   
   url ="https://www.amazon.com/dp/B004S4ZA4K"
  var content = UrlFetchApp.fetch(url).getContentText();
  var match = content.search("In Stock.");
Logger.log(match)
}

The result in Logger is always

Info: -1.0

But the string is definitely in the Source of the website as you can see from this picture

And the function itself is working. If I replace "In Stock." with "Amazon" for example it will return a value the is not -1.0

Disclaimer: I'm not a pro. I just want to make my life a bit easier. Help is highly appreciated.

Thanks Elisa

CodePudding user response:

Try it this way:

function productTitle() {
  const url = "https://www.amazon.com/dp/B004S4ZA4K"
  const content = UrlFetchApp.fetch(url).getContentText();
  const match = content.search(/In Stock/gi);
  Logger.log(match)
}

Execution log
12:40:47 AM Notice  Execution started
12:40:48 AM Info    1040984.0
12:40:50 AM Notice  Execution completed

According to this it occurs five times:

function productTitle() {
  const url = "https://www.amazon.com/dp/B004S4ZA4K"
  const content = UrlFetchApp.fetch(url).getContentText();
  const match = content.match(/In Stock/gim);
  Logger.log(JSON.stringify(match))
}

CodePudding user response:

  • Testing on my end, target URL returns a Captcha page verification if fetched via code, but not via browser.

  • That is expected, the Captcha wall purpose is to avoid automated scraping.

  • I'd recommend logging the fetch response contents when investigating this behavior.

  • Although you may end up “tricking” Amazon’s Captcha wall (altering the User-agent on the request headers may work for example), there is no guarantee that this will be consistent behavior.

  • I’d recommend using Amazon’s Product Advertising API to have consistent results.

  • Specifically for your scenario, the “In Stock” message will be located on the attribute Offers.Listings.Availability.Message of the API response. See more info here

  • In addition to that and once the fetch response contents matches with your expected results, the regex used would be optimized if changed to /In\sStock/gi (as pointed out by the user Cooper), adding the global and Case-insensitive flags. (More info here)

  • Related