Home > database >  How to scrape images with Cheerio and paste to Google Sheets?
How to scrape images with Cheerio and paste to Google Sheets?

Time:10-08

This is my first trial to learn to how to scrape images from a web and paste them to Google Sheets. I want to download the second image from enter image description here

CodePudding user response:

I believe your goal as follows.

  • You want to retrieve the 2nd image of img tags and put it to the Spreadsheet.

In this HTML, it seems that the URL is https://ir.eia.gov/ngs/ filename. So I thought that the method of insertImage(url, column, row) can be used. When this is reflect to your script, how about the following modified script?

Modified script:

function test() {
  const url = 'https://ir.eia.gov/ngs/ngs.html';
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  const $ = Cheerio.load(res);
  const urls = [];
  $('img').each(function () {
    urls.push("https://ir.eia.gov/ngs/"   $(this).attr('src'));
  });
  if (urls.length > 1) {
    SpreadsheetApp.getActiveSheet().insertImage(urls[1], 1, 1); // 2nd image is retrieved.
  }
}
  • When this script is run, the URL of https://ir.eia.gov/ngs/ngs.gif is retrieved and the image is put to the Spreadsheet.

Reference:

Added:

About your following new question in the comment,

Thanks a lot! So other than calling the index of the image, is there no method to call either alt="Working Gas in Underground Storage Compared with Five-Year Range" or src="ngs.gif" in the code? I'm just curious to learn a smart way for a potential scenario, for instance, if a web has 20 images and the locations of those images keep changing day by day, so the second image is not always in the second place. Thank you again for any guide!

In this case, how about the following sample script?

Sample script:

function test() {
  const url = 'https://ir.eia.gov/ngs/ngs.html';
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  const $ = Cheerio.load(res);

  const obj = [];
  $('img').each(function () {
    const t = $(this);
    const src = t.attr('src');
    obj.push({ alt: t.attr('alt'), src: src, url: "https://ir.eia.gov/ngs/"   src });
  });

  const searchAltValue = "Working Gas in Underground Storage Compared with Five-Year Range";
  const searchSrcValue = "ngs.gif";
  const ar = obj.filter(({alt, src}) => alt == searchAltValue && src == searchSrcValue);
  if (ar.length > 0) {
    SpreadsheetApp.getActiveSheet().insertImage(ar[0].url, 1, 1);
  }
}
  • In this sample script, when the values of src and alt are Working Gas in Underground Storage Compared with Five-Year Range and ngs.gif, respectively, the URL is retrieved and put to the image.
  • If you want to select Working Gas in Underground Storage Compared with Five-Year Range OR ngs.gif, please modify alt == searchAltValue && src == searchSrcValue to alt == searchAltValue || src == searchSrcValue.
  • Related