I hit a snag. This code snippet works great to get film ratings, when they exist. It errors out when it gets to a record that doesn't include the regex. "TypeError: Cannot read property '0' of null"
const ratingPass1 = /<span >/g;
const ratingPass2 = /(?<=<span >)/g;
for(var i = 0; i < 18; i )
{ var rating1String = results[i].match(ratingPass1);
Logger.log('content: ' rating1String[0]);
var rating2String = rating1String[0].match(ratingPass2);
--> error is here Logger.log('content: ' rating2String[0]);
I'm too new to javascript to know how to implement an 'includes' or 'contains' or something of that ilk in this code. But I'm getting not too bad with Regex, and figured I might be able to turn the regex into one large excluded group with the included group within it, so I tried:
const ratingPass1 = /(?:<span >)/g;
var rating1String = results[i].match(ratingPass1);
Logger.log('content: ' rating1String[0]);
but I keep getting the error, and I should, I guess because I'm still saying "find it, but it exclude it", where I need a "if you don't find it, just ignore it". Maybe it's the "match" in
var rating1String = results[i].match(ratingPass1);
Logger.log('content: ' rating1String[0]);
that could be changed to say something like match OR ignore if null?
Update: It took quite a few hours, but I figured something out. Might just work by some fluke, but at least it works!
I replaced the variables and logging info with the following:
var rating0String = "";
var rating1String = results[i].match(ratingPass1);
if(!ratingPass1){
Logger.log('content: ' rating0String);
}else{
Logger.log('content: ' rating1String);
};
var rating2String = results[i].match(ratingPass2);
if(!ratingPass2){
Logger.log('content: ' rating0String);
}else{
Logger.log('content: ' rating2String);
};
CodePudding user response:
Using two regular expressions that match the same text twice makes little sense, especially since your first regex already contains a capturing group around the pattern part you want to extract. Just use the index of the capture on the match object.
You need to use
const ratingPass = /<span >/g;
for (const result of results) {
const matches = result.matchAll(ratingPass);
for (const match of matches) {
Logger.log('rating1String: ' match[0]);
Logger.log('rating2String: ' match[1]);
}
}
Here,
<span >
matches<span >
for (const result of results) {...}
iterates over someresults
arrayconst matches = result.matchAll(ratingPass)
gets all matches perresult
stringfor (const match of matches) {...}
iterates over the matches foundmatch[0]
is the whole match value,match[1]
is the part captured into Group 1.
Update after you shared the script
function DiaryImportMain() {
DiaryImportclearRecords();
const url = "https://letterboxd.com/tag/30-countries-2021/diary/";
const str = UrlFetchApp.fetch(url).getContentText();
const mainRegex = /<li >([\s\S]*?)<\/li>/gi;
const results = str.match(mainRegex);
const filmTitlePass = /height="225" alt="([\s\S]*?)"\/>/i;
const usernamePass = /<strong ><a href="\/(?:[\s\S]*?)\/">([\s\S]*?)<\/a><\/strong>/i;
const ratingPass = /<span >/i;
for(var i = 0; i < 18; i ) {
Logger.log('content: ' results[i]);
const filmTitle = (results[i].match(filmTitlePass) || ['','']);
const filmTitle1String = filmTitle[0];
Logger.log('content: ' filmTitle1String);
const filmTitle2String = filmTitle[1];
Logger.log('content: ' filmTitle2String);
const username = (results[i].match(usernamePass) || ['','']);
const username1String = username[0];
Logger.log('content: ' username1String);
const username2String = username[1];
Logger.log('content: ' username2String);
const rating = (results[i].match(ratingPass) || ['','']);
const rating1String = rating[0];
Logger.log('content: ' rating1String);
const rating2String = rating[1];
Logger.log('content: ' rating2String);
DiaryImportaddRecord(i 1, filmTitle2String, username2String, rating2String);
}
}
CodePudding user response:
It can be done effectively using Cheerio library, check self-explanatory comments in code:
function matchRating()
{
// TODO replace html with your data
const html = '<div><span ></span><span ></span><span ></span></div>';
// create Cheerio object
const $ = Cheerio.load(html);
const ratingPrefixForClass = 'rated-';
// select all spans with `rating` class
$(".rating").each((i, el) => {
let classAttr = $(el).attr('class');
// split class attribute to get list of class names, find one with needed prefix
let ratingClassSearch = classAttr.split(' ').find(cls => cls.indexOf(ratingPrefixForClass) === 0);
// if needed class with prefix found, log its name, and its name without prefix
if (ratingClassSearch)
{
console.log(ratingClassSearch, ratingClassSearch.substring(ratingPrefixForClass.length));
}
});
}
Main points:
- Do not use regex for parsing HTML.
- Uses Cheerio JS library ported for Google Apps Script. To install it, you need add it as a dependency.