Home > OS >  How do I get text after single <br> tag in Cheerio
How do I get text after single <br> tag in Cheerio

Time:11-13

I'm trying to get some text using Cheerio that is placed after a single <br> tag.

I've already tried the following lines:

let price = $(this).nextUntil('.col.search_price.discounted.responsive_secondrow').find('br').text().trim();
let price = $(this).nextUntil('.col.search_price.discounted.responsive_secondrow.br').text().trim();

Here is the HTML I'm trying to scrape:

<div  data-price-final="5039">
  <div >
    <span>-90%</span>
  </div>
  <div >
    <span style="color: #888888;"><strike>ARS$ 503,99</strike></span><br>ARS$ 50,39     
  </div>
</div>

I would like to get "ARS$ 50,39".

CodePudding user response:

If you're comfortable assuming this text is the last child element, you can use .contents().last():

const cheerio = require("cheerio"); // 1.0.0-rc.12

const html = `
<div  data-price-final="5039">
  <div >
    <span>-90%</span>
  </div>
  <div >
    <span style="color: #888888;"><strike>ARS$ 503,99</strike></span><br>ARS$ 50,39     
  </div>
</div>
`;
const $ = cheerio.load(html);
const sel = ".col.search_price.discounted.responsive_secondrow";
const text = $(sel).contents().last().text().trim();
console.log(text); // => ARS$ 50,39

If you aren't comfortable with that assumption, you can search through the children to find the first non-empty text node:

// ...
const text = $([...$(sel).contents()]
  .find(e => e.type === "text" && $(e).text().trim()))
  .text()
  .trim();
console.log(text); // => ARS$ 50,39

If it's critical that the text node immediately follows a <br> tag specifically, you can try:

// ...
const contents = [...$(sel).contents()];
const text = $(contents.find((e, i) =>
    e.type === "text" && contents[i-1]?.tagName === "br"
  ))
  .text()
  .trim();
console.log(text); // => ARS$ 50,39

If you want all of the immediate text children, see:

CodePudding user response:

You should be able to get the price by using:

$('.col.search_price.discounted.responsive_secondrow').html().trim().split('<br>')

This gets the inner HTML of the element, trims extra spaces, then splits on the <br> and takes the 2nd part.

See example at https://jsfiddle.net/b7nt0m24/3/ (note: uses jquery which has a similar API to cheerio)

  • Related