Home > Back-end >  How to scrape part of a page node.js
How to scrape part of a page node.js

Time:07-18

I'm fairly new to coding and I don't understand why my code isn't working. It is suppose to output "Dominus Empyreus" Though it only outputs []

Here is my current code:

const axios = require('axios'); 
const cheerio = require('cheerio'); 
 
const extractLinks = $ => [ 
    ...new Set( 
        $('.border-bottom item-name-container') // Select pagination links 
            .toArray() // Convert cheerio object to array 
    ), 
]; 
 
axios.get('https://www.roblox.com/catalog/21070012/Dominus-Empyreus').then(({ data }) => { 
    const $ = cheerio.load(data); // Initialize cheerio 
    const stuff = extractLinks($); 
 
    console.log(stuff); 
    // ['Dominus Empyreus'] 
});

Any help is appreciated! Thanks.

CodePudding user response:

This is the element you are trying to select.

<div >

You have written:

.border-bottom item-name-container

Which consists of:

  • A class selector (for )
  • A descendant combinator
  • A type selector (for <item-name-container>)

What you need is:

  • A class selector (for )
  • Another class selector (for )
  • No descendant combinator (because you are targeting two features of the same element, not one element that is a descendant of the other).

Such:

.border-bottom.item-name-container

CodePudding user response:

An alternative solution is to use the full selector path to the element directly that you want to scrape from a web page as such:

const axios = require('axios'); 
const cheerio = require('cheerio'); 

const extractLinks = $ => [
    ...new Set( 
        $("#item-container > div.remove-panel.section-content.top-section > div.border-bottom.item-name-container > h1") // Select pagination links 
            .toArray() // Convert cheerio object to array 
    ), 
]; 

axios.get('https://www.roblox.com/catalog/21070012/Dominus-Empyreus').then(({ data }) => { 
    const $ = cheerio.load(data); // Initialize cheerio 
    const stuff = extractLinks($);

    console.log(stuff[0].children[0].data);
    // ['Dominus Empyreus'] 
});

A quick and simple way of obtaining a full selector path is using the F12 developer's console on your browser, right-click the element -> Copy -> Copy Selector.

img

In this case, the selector for this <h1> element is:

#item-container > div.remove-panel.section-content.top-section > div.border-bottom.item-name-container > h1

The one thing to note is that this solution will only work if the selector path is the exact same on every page you wish to scrape.

  • Related