I am checking to see if I can use gstatic to scrape favicon from websites. Below will fetch the websites Favicon:
https://t2.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=https://yahoo.com&size=64
I understand that the URL parameters might not be for general use, but just checking if anyone knows where this might be documented?
UPDATE: I have just started building an app on Google App Script. I need to list website names along with their favicons and metadata like site description, etc. Currently the only approach is to read the webpage and use beautifulSoup to parse the page and then locate the favicon. I came across the above link that will directly give me the favicon! But I want to understand it better and trying to locate more information on the URL parameters for gstatic. I am also open to alternative ways to scrape a web site from Google App Script...
Thanks
CodePudding user response:
I believe your goal is as follows.
- You want to retrieve the favicon from the websites.
- You want to use the following sample URL.
https://t2.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=https://yahoo.com&size=64
- From
I need to list website names along with their favicons and metadata like site description, etc.
, you want to retrieve the favicon, title, and description of the site using Google Apps Script.
Sample script 1:
When your URL of https://t2.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=https://yahoo.com&size=64
is used, how about the following sample script? Please copy and paste the following script to the script editor of Google Apps Script. And, run samoke1
at the script editor.
function sample1() {
const uri = 'https://t2.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=https://yahoo.com&size=64';
const blob = UrlFetchApp.fetch(encodeURI(uri)).getBlob();
DriveApp.createFile(blob);
}
- When this script is run, the favicon is retrieved and that is saved as a file to the root folder of Google Drive.
- When I saw the URL, it seems that the favicon is retrieved as the image data.
Sample script 2:
When the favicon, title, and description of the site are retrieved, how about the following sample script?
function sample2() {
const uri = 'https://yahoo.com'; // Please set the URL.
const obj = { title: "", description: "", faviconUrl: "" };
const res = UrlFetchApp.fetch(encodeURI(uri));
const html = res.getContentText();
const title = html.match(/<title>(. ?)<\/title>/i);
if (title || title.length > 1) {
obj.title = title[1];
}
const description = html.match(/<meta. name\="description". >/i);
if (description) {
const d = description[0].match(/content\="(. )"/i);
if (d && d.length > 1) {
obj.description = d[1];
}
}
const faviconUrl = html.match(/rel="icon". ?href\="(. ?)"/i);
if (faviconUrl && faviconUrl.length > 1) {
obj.faviconUrl = faviconUrl[1];
}
console.log(obj);
}
When this script is run, you can see the following value in the log.
{ "title":"Yahoo | Mail, Weather, Search, Politics, News, Finance, Sports & Videos", "description":"Latest news coverage, email, free stock quotes, live scores and video are just the beginning. Discover more every day at Yahoo!", "faviconUrl":"https://s.yimg.com/cv/apiv2/default/icons/favicon_y19_32x32_custom.svg" }