Home > Net >  Google Apps Script Blocked by Mod Security Error 406
Google Apps Script Blocked by Mod Security Error 406

Time:01-02

I want to scrape a website using Google Apps Script. unfortunately, it giving me an error 406.

below is the full details of the error:

resp:
"<html><head><title>Error 406 - Not Acceptable</title><head><body><h1>Error 406 - Not Acceptable</h1><p>Generally a 406 error is caused because a request has been blocked by Mod Security. If you belie…"

Below is a sample of the code:

var ss = SpreadsheetApp.getActiveSpreadsheet();

var options = {

  'method' : 'get',

  'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)
 
Chrome/51.0.2704.103 Safari/537.36',

  'muteHttpExceptions': true,

  Authorization: 'Bearer ?????',

  Accept: 'application/json',

  'Content-Type': 'application/json',

}


function scrapeJobs(){

  var result = [];

  for(var i =1; i <= 2; i  ){

    var url = 'https://www.xxxxxxx/page/'    i;

   var resp = UrlFetchApp.fetch(url, options,).getContentText();

var $ = Cheerio.load(resp);

var jobList = $("#titlo > strong > a");

var urls =  jobList.map(function() {return $(this).attr('href');}).toArray();

// debug code - outputs the urls it collected

console.log(urls);

for (let i = 0; i < urls.length; i  ) {

  var data = scrapeJobDetails(urls[i]);

  if (data != null) {

    result.push(...data);

  }

  }

 }

I tried following the answer to a similar question asked here but no success. see the link below:

Google App Script external API return error 406

CodePudding user response:

To fetch a website for scraping, simply use

function myFunction() {
  Logger.log(UrlFetchApp.fetch('<url of website>', {
    muteHttpExceptions: true
  }).getContentText());
}

(When running myFunction in the Apps Script editor, this logs the contents of the website.)

The muteHttpExceptions is necessary if you want to scape also error pages like "404 Not Found". (Without it, such a page leads to a catchable exception.)

CodePudding user response:

The reference to ModSecurity is very odd. ModSec does not return status code 406 unless with special configuration. So the whole "Generally a 406 error is caused because a request has been blocked by Mod Security ..." is probably a generic message misleading you here.

  • Related