I want to extract a website's title, image and description from Node JS.
I was using link-preview-js
library, but it is not extracting data for this specific link. What do I do?
CodePudding user response:
You can use the html-metadata-parser
library in node.js:
var Meta = require('html-metadata-parser');
var getPageMetadata = async () => {
var result = await Meta.parser("https://www.usnews.com/education/best-colleges/articles/college-applications-are-on-the-rise-what-to-know");
console.log(JSON.stringify(result, null, 3));
}
getPageMetadata();
Which will produce the following output:
{
"meta": {
"title": "YouTube",
"description": "Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube."
},
"og": {
"images": [
{
"url": "https://i.ytimg.com/vi/GN2nFJ9Ku6Q/maxresdefault_live.jpg",
"width": 1280,
"height": 720
}
],
"videos": [
{
"url": "https://www.youtube.com/embed/GN2nFJ9Ku6Q",
"secure_url": "https://www.youtube.com/embed/GN2nFJ9Ku6Q",
"type": "text/html",
"width": 480,
"height": 360
}
],
"site_name": "YouTube",
"url": "https://www.youtube.com/watch?v=GN2nFJ9Ku6Q",
"title": "Create Resilience – Pivoting a running business in real time",
"image": "https://i.ytimg.com/vi/GN2nFJ9Ku6Q/maxresdefault_live.jpg",
"description": "What do you do when the lockdown in response to a global pandemic shuts down your business model? Brian Fitzpatrick, Founder and CTO of Tock, led his company...",
"type": "video.other"
},
"images": []
}
CodePudding user response:
Here's a list of user-agents you can switch to using the link-preview-js
library: https://whatmyuseragent.com/engines
Solution:
const { getLinkPreview } = require('link-preview-js');
const linkResult = await getLinkPreview(link, {
timeout: 10000,
followRedirects: "manual",
handleRedirects: (baseURL, forwardedURL) => {
const base = new URL(baseURL).hostname
const forwarded = new URL(forwardedURL).hostname
return (forwarded === base || forwarded === "www." base)
},
headers: {
"user-agent": "Mozilla/5.0 (Linux; Android 11; vivo 1904; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/87.0.4280.141 Mobile Safari/537.36 VivoBrowser/8.7.0.1"
}
})