Home > Back-end >  HTML Scraper Script w/ Pagination
HTML Scraper Script w/ Pagination

Time:06-21

Basically, I'm trying to pull in lists like enter image description here

Note:

  • If you want to use this script as a custom function, how about the following script? In this case, please put a custom function of =SAMPLE() to a cell.

      function SAMPLE() {
        const maxPage = 10; // From your question, the max page number is 10.
        const reqs = [...Array(maxPage)].map((_, i) => ({ url: `https://letterboxd.com/prof_ratigan/list/top-1000-films-of-all-time-calculated/detail/page/${i   1}/`, muteHttpExceptions: true }));
        return UrlFetchApp.fetchAll(reqs).flatMap((r, i) => {
          if (r.getResponseCode() != 200) {
            return [["Values couldn't be retrieved.", reqs[i].url]];
          }
          const $ = Cheerio.load(r.getContentText());
          const ar = $('li > div.film-detail-content > h2 > a , small > a').toArray();
          return [...Array(Math.ceil(ar.length / 2))].map((_) => {
            const temp = ar.splice(0, 2);
            return [$(temp[0]).text().trim(), Number($(temp[1]).text().trim())];
          });
        });
      }
    

Note:

  • This sample script is for the current HTMl in the URL. If the specification of the site is changed, this script might not be able to be used. Please be careful about this.

References:

  • Related