This is my first trial to learn to how to scrape images from a web and paste them to Google Sheets. I want to download the second image from
CodePudding user response:
I believe your goal as follows.
- You want to retrieve the 2nd image of
img
tags and put it to the Spreadsheet.
In this HTML, it seems that the URL is https://ir.eia.gov/ngs/
filename
. So I thought that the method of insertImage(url, column, row)
can be used. When this is reflect to your script, how about the following modified script?
Modified script:
function test() {
const url = 'https://ir.eia.gov/ngs/ngs.html';
const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
const $ = Cheerio.load(res);
const urls = [];
$('img').each(function () {
urls.push("https://ir.eia.gov/ngs/" $(this).attr('src'));
});
if (urls.length > 1) {
SpreadsheetApp.getActiveSheet().insertImage(urls[1], 1, 1); // 2nd image is retrieved.
}
}
- When this script is run, the URL of
https://ir.eia.gov/ngs/ngs.gif
is retrieved and the image is put to the Spreadsheet.
Reference:
Added:
About your following new question in the comment,
Thanks a lot! So other than calling the index of the image, is there no method to call either alt="Working Gas in Underground Storage Compared with Five-Year Range" or src="ngs.gif" in the code? I'm just curious to learn a smart way for a potential scenario, for instance, if a web has 20 images and the locations of those images keep changing day by day, so the second image is not always in the second place. Thank you again for any guide!
In this case, how about the following sample script?
Sample script:
function test() {
const url = 'https://ir.eia.gov/ngs/ngs.html';
const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
const $ = Cheerio.load(res);
const obj = [];
$('img').each(function () {
const t = $(this);
const src = t.attr('src');
obj.push({ alt: t.attr('alt'), src: src, url: "https://ir.eia.gov/ngs/" src });
});
const searchAltValue = "Working Gas in Underground Storage Compared with Five-Year Range";
const searchSrcValue = "ngs.gif";
const ar = obj.filter(({alt, src}) => alt == searchAltValue && src == searchSrcValue);
if (ar.length > 0) {
SpreadsheetApp.getActiveSheet().insertImage(ar[0].url, 1, 1);
}
}
- In this sample script, when the values of
src
andalt
areWorking Gas in Underground Storage Compared with Five-Year Range
andngs.gif
, respectively, the URL is retrieved and put to the image. - If you want to select
Working Gas in Underground Storage Compared with Five-Year Range
ORngs.gif
, please modifyalt == searchAltValue && src == searchSrcValue
toalt == searchAltValue || src == searchSrcValue
.