Home > Net >  Get title, image and description of a web page from Node JS
Get title, image and description of a web page from Node JS

Time:07-21

I want to extract a website's title, image and description from Node JS.

Website: https://www.usnews.com/education/best-colleges/articles/college-applications-are-on-the-rise-what-to-know

I was using link-preview-js library, but it is not extracting data for this specific link. What do I do?

CodePudding user response:

You can use the html-metadata-parser library in node.js:

var Meta = require('html-metadata-parser');

var getPageMetadata = async () => {
    var result = await Meta.parser("https://www.usnews.com/education/best-colleges/articles/college-applications-are-on-the-rise-what-to-know");
    console.log(JSON.stringify(result, null, 3));
}

getPageMetadata();

Which will produce the following output:

{
   "meta": {
      "title": "YouTube",
      "description": "Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube."
   },
   "og": {
      "images": [
         {
            "url": "https://i.ytimg.com/vi/GN2nFJ9Ku6Q/maxresdefault_live.jpg",
            "width": 1280,
            "height": 720
         }
      ],
      "videos": [
         {
            "url": "https://www.youtube.com/embed/GN2nFJ9Ku6Q",
            "secure_url": "https://www.youtube.com/embed/GN2nFJ9Ku6Q",
            "type": "text/html",
            "width": 480,
            "height": 360
         }
      ],
      "site_name": "YouTube",
      "url": "https://www.youtube.com/watch?v=GN2nFJ9Ku6Q",
      "title": "Create Resilience – Pivoting a running business in real time",
      "image": "https://i.ytimg.com/vi/GN2nFJ9Ku6Q/maxresdefault_live.jpg",
      "description": "What do you do when the lockdown in response to a global pandemic shuts down your business model? Brian Fitzpatrick, Founder and CTO of Tock, led his company...",
      "type": "video.other"
   },
   "images": []
}

CodePudding user response:

Here's a list of user-agents you can switch to using the link-preview-js library: https://whatmyuseragent.com/engines

Solution:

const { getLinkPreview } = require('link-preview-js');

const linkResult = await getLinkPreview(link, {
    timeout: 10000,
    followRedirects: "manual",
    handleRedirects: (baseURL, forwardedURL) => {
        const base = new URL(baseURL).hostname
        const forwarded = new URL(forwardedURL).hostname
        return (forwarded === base || forwarded === "www."   base)
    },
    headers: {
        "user-agent": "Mozilla/5.0 (Linux; Android 11; vivo 1904; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/87.0.4280.141 Mobile Safari/537.36 VivoBrowser/8.7.0.1"
    }
})
  • Related