I use the Cheeriogs library for scraping:
So I'd like to make it more secure, so I tried using:
div.schema > div > div.tnms > div > a:contains("/en/predictions-tips")
That didn't work, how should I use contains
for this need?
Add infos:
CodePudding user response:
In your situation, how about the following selectors?
From:
const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn');
To:
const scrapurl = $('a.tnmscn[href^="/en/predictions"]');
or
const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn[href^="/en/predictions"]');
or
const scrapurl = $('div.schema > div > div.tnms > div > a[href^="/en/predictions"]');
- In the above all-modified scripts,
/en/predictions-tips-wealdstone-solihull-moors-1455115
is retrieved. - In above selectors, the start text of
href
in the taga
and the taga
with the classtnmscn
is/en/predictions
.
But, from the URL you are using, 2 values are retrieved. This has already been mentioned by Granitosaurus's comment. So I think that when you want to retrieve the 1st value, the above modification for your script can be used.
If you want to retrieve 2 values, how about the following modification?
Modified script:
In this modification, the above modified selectors can be also used.
const url = "https://www.forebet.com/en/teams/wealdstone";
const contentText = UrlFetchApp.fetch(url).getContentText();
const $ = Cheerio.load(contentText);
const scrapurl = $('div.schema > div > div.tnms > div > a.tnmscn[href^="/en/predictions"]'); // and a.tnmscn[href^="/en/predictions"]
$(scrapurl).each(function() {
const urlmatch = $(this).attr('href');
console.log(urlmatch);
});
When this script is run, the following result is obtained.
/en/predictions-tips-wealdstone-solihull-moors-1455115 /en/predictions-tips-crawley-town-leyton-orient-1474259