Home > Net >  Scraping a url value using contains and Cheeriogs
Scraping a url value using contains and Cheeriogs

Time:10-06

I use the Cheeriogs library for scraping:

enter image description here

So I'd like to make it more secure, so I tried using:

div.schema > div > div.tnms > div > a:contains("/en/predictions-tips")

That didn't work, how should I use contains for this need?

Add infos:

Page Link
enter image description here

CodePudding user response:

In your situation, how about the following selectors?

From:

const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn');

To:

const scrapurl = $('a.tnmscn[href^="/en/predictions"]');

or

const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn[href^="/en/predictions"]');

or

const scrapurl = $('div.schema > div > div.tnms > div > a[href^="/en/predictions"]');
  • In the above all-modified scripts, /en/predictions-tips-wealdstone-solihull-moors-1455115 is retrieved.
  • In above selectors, the start text of href in the tag a and the tag a with the class tnmscn is /en/predictions.

But, from the URL you are using, 2 values are retrieved. This has already been mentioned by Granitosaurus's comment. So I think that when you want to retrieve the 1st value, the above modification for your script can be used.

If you want to retrieve 2 values, how about the following modification?

Modified script:

In this modification, the above modified selectors can be also used.

const url = "https://www.forebet.com/en/teams/wealdstone";
const contentText = UrlFetchApp.fetch(url).getContentText();
const $ = Cheerio.load(contentText);
const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn[href^="/en/predictions"]'); // and a.tnmscn[href^="/en/predictions"]
$(scrapurl).each(function() {
  const urlmatch = $(this).attr('href');
  console.log(urlmatch);
});
  • When this script is run, the following result is obtained.

      /en/predictions-tips-wealdstone-solihull-moors-1455115
      /en/predictions-tips-crawley-town-leyton-orient-1474259
    

References:

  • Related