I've really tried everything, but I can't get anywhere. I would like to make a very simple evaluation of whether a term occurs in the source text of a page. I would like to know whether the term "In Stock." is contained on the page.
This is my code. Very simple.
function productTitle(url) {
url ="https://www.amazon.com/dp/B004S4ZA4K"
var content = UrlFetchApp.fetch(url).getContentText();
var match = content.search("In Stock.");
Logger.log(match)
}
The result in Logger is always
Info: -1.0
But the string is definitely in the Source of the website as you can see from this picture
And the function itself is working. If I replace "In Stock." with "Amazon" for example it will return a value the is not -1.0
Disclaimer: I'm not a pro. I just want to make my life a bit easier. Help is highly appreciated.
Thanks Elisa
CodePudding user response:
Try it this way:
function productTitle() {
const url = "https://www.amazon.com/dp/B004S4ZA4K"
const content = UrlFetchApp.fetch(url).getContentText();
const match = content.search(/In Stock/gi);
Logger.log(match)
}
Execution log
12:40:47 AM Notice Execution started
12:40:48 AM Info 1040984.0
12:40:50 AM Notice Execution completed
According to this it occurs five times:
function productTitle() {
const url = "https://www.amazon.com/dp/B004S4ZA4K"
const content = UrlFetchApp.fetch(url).getContentText();
const match = content.match(/In Stock/gim);
Logger.log(JSON.stringify(match))
}
CodePudding user response:
Testing on my end, target URL returns a Captcha page verification if fetched via code, but not via browser.
That is expected, the Captcha wall purpose is to avoid automated scraping.
I'd recommend logging the fetch response contents when investigating this behavior.
Although you may end up “tricking” Amazon’s Captcha wall (altering the User-agent on the request headers may work for example), there is no guarantee that this will be consistent behavior.
I’d recommend using Amazon’s Product Advertising API to have consistent results.
Specifically for your scenario, the “In Stock” message will be located on the attribute
Offers.Listings.Availability.Message
of the API response. See more info hereIn addition to that and once the fetch response contents matches with your expected results, the regex used would be optimized if changed to
/In\sStock/gi
(as pointed out by the userCooper
), adding the global and Case-insensitive flags. (More info here)