Home > Back-end >  Extracting data from web page using Cheerio Library
Extracting data from web page using Cheerio Library

Time:06-19

I am trying to scrape very small information from a webpage using Cheerio and Google Apps Script.


Following is the code snippet which I am using to get it:

function LinkResult(){

  var url ='https://pagespeed.web.dev/report?url=http://www.juicecoldpressed.com/';

  var result = UrlFetchApp.fetch(url);
  var content = Cheerio.load(result.getContentText())
  var item = content(".tag").text()

  Logger.log(item)
  
}

As I run, this code does not show any output in the variable item. Surely there is something which I am missing, can you please guide me? Thank you.

CodePudding user response:

Issue and workaround:

In this case, I'm worried that your goal might not be able to be directly achieved using the URL of https://pagespeed.web.dev/report?url=http://www.juicecoldpressed.com/ and Cheerio. Because the HTML data retrieved from UrlFetchApp.fetch(url) is different from that on the browser. And, it seems that the value is calculated using a script.

Fortunately, in your situation, I thought that your values can be retrieved using PageSpeed Insights API. In this answer, I would like to propose achieving your goal using PageSpeed Insights API.

Usage:

1. Get Started with the PageSpeed Insights API.

Please check the official document for using PageSpeed Insights API. In this case, it is required to use your API key. And, please enable PageSpeed Insights API at the API console.

2. Sample script.

function myFunction() {
  const apiKey = "###"; // Please set your API key.
  const url = "http://www.juicecoldpressed.com/"; // Please set URL.

  const apiEndpoint = `https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key=${apiKey}&url=${encodeURIComponent(url)}&category=performance`;
  const strategy = ["desktop", "mobile"];
  const res = UrlFetchApp.fetchAll(strategy.map(e => ({ url: `${apiEndpoint}&strategy=${e}`, muteHttpExceptions: true })));
  const values = res.reduce((o, r, i) => {
    if (r.getResponseCode() == 200) {
      const obj = JSON.parse(r.getContentText());
      o[strategy[i]] = obj.lighthouseResult.categories.performance.score * 100;
    } else {
      o[strategy[i]] = null;
    }
    return o;
  }, {});
  
  console.log(values);
}

3. Testing.

When this script is run, you can see the returned value of { desktop: ##, mobile: ## } at the log. The values (the unit is %.) of desktop and mobile are the values for the desktop and the mobile, respectively.

Reference:

  • Related