Home > Back-end >  For each URL in ColA, loop through extracted data. Stuck on UrLFetchApp for multiple URLs
For each URL in ColA, loop through extracted data. Stuck on UrLFetchApp for multiple URLs

Time:03-01

The gist: I'm able to successfully retrieve data using the original script for a single URL. Now I need to modify it to add a for each loop to repeat the script for each url in ColA of my gsheet. After 2 days of trying things I think I'm on to something here, but I know that this line const str =UrlFetchApp.fetch(urls).getContentText(); is wrong. It worked when I had a const for a single url, and I don't know how to change it (or where to put it) to say: fetch and get content for each of the URLS in the range. I'm learning javascript as I go, so still very much a newbie. Any help is much appreciated.

And yes, I've already been warned about not using Regex, but this works and I need it for Tuesday. I'll rewrite without Regex later.

function LoopURLs() {
Loop_clearRecords();
 let ss = SpreadsheetApp.getActive();
 let sheet = ss.getSheetByName("UserDiary");
 let urlRange = sheet.getRange('A3:A15');
 let urls = urlRange.getValues();
 const str = UrlFetchApp.fetch(urls).getContentText();
  const mainRegex = /<td >([\s\S]*?)>Edit this entry\<\/a>/gi;
  const results = str.match(mainRegex);
  const filmIDRegex = /data-film-id="([0-9]*?)"/i;
  const filmTitleRegex = /data-film-name="([\s\S]*?)"/i;

 urls.forEach(function(row){
   Logger.log('For each loop: '   row);
   for(var i = 0; i < results.length; i  ) {
    const filmIDResults = (results[i].match(filmIDRegex) || ['','']);
    const filmID1 = filmIDResults[0]; 
    const filmID = filmIDResults[1];
    Logger.log('filmIDcode: '   filmID1);    
    Logger.log('filmID: '   filmID);
    const filmTitleResults = (results[i].match(filmTitleRegex) || ['','']);
    const filmTitle1 = filmTitleResults[0]; 
    const filmTitle = filmTitleResults[1];
    Logger.log('titlecode: '   filmTitle1);  
    Logger.log('title: '   filmTitle);
 };
})

Loop_addRecord(filmTitle, filmID );
  }

function Loop_clearRecords()
{
  var ss= SpreadsheetApp.getActiveSpreadsheet();
  var tableSheet = ss.getSheetByName("UserDiary");
 // select range to clear
  tableSheet.getRange("C3:Z100").clear();
}
function Loop_addRecord(filmTitle, filmID) {
  var ss= SpreadsheetApp.getActiveSpreadsheet();
  var tableSheet = ss.getSheetByName("UserDiary");
  var currentRow = tableSheet.getLastRow();
  var nextRow = currentRow   1;
  tableSheet.getRange(nextRow,4).setValue(filmID);
  tableSheet.getRange(nextRow,5).setValue(filmTitle);
  
}

CodePudding user response:

From your following replys,

the original script has one url hardcoded in it, and it gets filmTitle and filmID for that url. But I have multiple URLS in cells A3-A13, and I would like the script to run for each of those urls; so for url#1 it runs and gets FilmTitle, FilmID, outputs, then it gets URL#2, gets FilmTItle, FilmID, outputs below, and it repeats for all the urls in Col A. Also, I truncated the script displayed here to only get two values (FilmTItle and FilmID) just to make it simpler, as I can add the other values I'm looking for later once I know how to loop the 'for each url', do this...

one of the reasons I simplified the script here is because I thought it was good form; I've seen many people complain to posters that their code is way too long and that they should just ask for the essentials.

I understood your goal of this question is as follows.

  • You want to retrieve the URLs from the cells "A3:A11". And, you want to retrieve 2 values of filmTitle, filmID from each URL, and put the retrieved values to "C3" in the sheet.

In this case, how about the following modified script?

Modified script:

function LoopURLs() {
  let ss = SpreadsheetApp.getActive();
  let sheet = ss.getSheetByName("UserDiary");
  sheet.getRange("C3:Z100").clear();
  let urlRange = sheet.getRange('A3:A13');
  let urls = urlRange.getValues().map(([a]) => ({url: a, muteHttpExceptions: true}));
  const res = UrlFetchApp.fetchAll(urls);
  const values = res.flatMap(e => {
    if (e.getResponseCode() == 200) {
      const str = e.getContentText();
      const mainRegex = /<td >([\s\S]*?)>Edit this entry\<\/a>/gi;
      const results = str.match(mainRegex);
      const filmIDRegex = /data-film-id="([0-9]*?)"/i;
      const filmTitleRegex = /data-film-name="([\s\S]*?)"/i;
      const temp = [];
      if (results && results.length > 0) {
        for (var i = 0; i < results.length; i  ) {
          const filmIDResults = (results[i].match(filmIDRegex) || ['', '']);
          const filmID = filmIDResults[1];
          const filmTitleResults = (results[i].match(filmTitleRegex) || ['', '']);
          const filmTitle = filmTitleResults[1];
          temp.push([filmID, filmTitle]);
        }
        return temp;
      }
    }
    return [Array(2).fill("")];
  });
  sheet.getRange(3, 3, values.length, values[0].length).setValues(values);
}
  • In this modification, the URLs are requested using the method of fetchAll. And, the values are retrieved from each response and put the values to the sheet. In this case, the values of filmTitle, filmID are put from the cell "C3".

Note:

  • This modified script supposes that the values of filmTitle, filmID can be retrieved from your URLs. Please be careful about this.

Reference:

  • Related