Let's say message.getBody() returns this result as:
<p>Stejně jako došlo ke změně stezek, pracuje se také na aktualizaci odborek.</p>
<p>Aktuální podobu odborek zde nabízíme ke stažení:</p>
<table>
<tbody>
<tr>
<td><a href="dokumenty/file/517-odborne-zkousky-pro-skauty-a-skautky">Odborky pro skauty a skautky</a></td>
<td><a href="dokumenty/file/517-odborne-zkousky-pro-skauty-a-skautky"><img src="http://static.krizovatka.skaut.cz/noRW_photodisplay/7223_9bd54a63a4719dc5fb347735fa81cd27.jpg?1324651878" alt="" height="80" width="80" /></a></td>
<td>Kompletní odborky. Rok vydání 1998. <a href="dokumenty/file/517-odborne-zkousky-pro-skauty-a-skautky">Stahujte</a>, <a href="http://www.obchod.skaut.cz/publikace/Odborne-zkousky-skautek-a-skautu.html">Kupte si</a></td>
</tr>
</tbody>
</table>
We can use this regular express href=\"(.*?)\"
to extract the "dokumenty/file/517-odborne-zkousky-pro-skauty-a-skautky
" from <td><a href="dokumenty/file/517-odborne-zkousky-pro-skauty-a-skautky"><img src="http://static.krizovatka.skaut.cz/noRW_photodisplay/7223_9bd54a63a4719dc5fb347735fa81cd27.jpg?1324651878" alt="" height="80" width="80" /></a></td>
Question: Now how can we do the same when the message.getBody() method return one line "object"/string as:
<!doctype html><html><div style="width: 100%; max-width: 650px;"><div style="font-family: 'Arial';"><table style="border-collapse: collapse; border-left: 1px solid #e4e4e4; border-right: 1px solid #e4e4e4;"><tr><td style="background-color: #f8f8f8; border-bottom: 1px solid #e4e4e4; border-top: 1px solid #e4e4e4; padding-left: 18px;"></td><td style="background-color: #f8f8f8; border-bottom: 1px solid #e4e4e4; border-top: 1px solid #e4e4e4; padding: 18px 10px 9px 0px;" valign=middle><a style="text-decoration: none;" href=//trends.ama.com/trends?utm_source=storyfinder&utm_medium=email&utm_campaign=v1&utm_content=logo><img alt="Ama Trend" border=0 height=24 src=https://www.gstatic.com/images/branding/lockups/1x/lockup_trends_color_142x24dp.png></a></td><td style="background-color: #f8f8f8; border-bottom: 1px solid #e4e4e4; border-top: 1px solid #e4e4e4; padding-right: 18px;"></td></tr><tr><td style="padding-left: 32px;"></td><td style="font-family: 'Arial'; line-height: 20px; padding: 29px 0 0 0; vertical-align: middle;"><span style="color: #1f1f1f; font-size: 22px;"> is trending today on Ama.</span><div style="color: #666; font-size: 13px; line-height: 16px; padding-top: 6px; vertical-align: top;"><span>Taiwan</span><span style="padding: 0px 4px 0px 4px">⋅</span><a style="color: #aaa; text-decoration: none;">Saturday, December 18, 2021</a></div></td><td style="padding-right: 32px;"></td></tr><tr><td colspan=3 style="height: 15px;"></td></tr><tr><td style="padding-left: 18px;"></td><td style="border-top: 1px solid #e4e4e4 ; font-family: 'Arial'; padding: 18px 0 12px 0; vertical-align: top;"><a style="text-decoration: none;" href=https://feed.ama.com/g/p/AD-FnEwizam9DvKiTR8XdAnbGCe0o6Dng65WkxklEQ6P9ueoMRpGG00NLVX_PA5lHPz6DeGQvaURUnCuEI9PzRQNQHR1-t1Elpru3e1nbbJlhlUZ3FbFv2am><table align=right style="display: inline; border-collapse: collapse; padding-bottom: 5px;"><tr><td style="padding-left: 18px;"></td><td style="background-repeat: no-repeat; border: 1px solid #e4e4e4; padding: 0; text-align: center;" valign=bottom background=https://t0.gstatic.com/images?q=tbn:ANd9GcR2Ig9w7TRcJptTamszZOz9ZusluWcuTdwEcZbeLb31PgquvyjBmrje4WMzBA8f8SUYbFpHBX0k height=100 width=100><!--[if gte mso 9]><v:image xmlns:v="urn:schemas-microsoft-com:vml" id="theImage" style='behavior: url(#default#VML); display:inline-block; position:absolute; height: 100px; width: 100px; top:0; left:0; border:0; z-index:1;'</div></td><td style="padding-right: 32px;"></td></tr></table></div></div><img alt="" height=1 width=3 src=https://feed.ama.com/g/img/AD-FnEyO4-BP639s1TuG_VZB2QMQM1OiAziES42-qyAMNpSqbMY.gif></html>
Then calls a function as returns null
in Google App Script:
....
str = message.getBody()
result= myFunction(str)
function myFunction(str){
var re = /href=\"(.*?)\"/g;
var result = "";
while ((res=re.exec(str)) !== null) {
result = res[1] "\n";
};
return result.slice(0,-1);
}
How can we make this regex work in Google App Script ?
CodePudding user response:
You can use
const myFunction = (str, regex) => {
const result = Array.from(str.matchAll(regex), (x) => x[2] ?? x[1]);
return result.join("\n");
}
const str = message.getBody()
const regex = /\shref=("([^"]*)"|[^\s>] )/gi;
result = myFunction(str, regex)
See the regex demo. Details:
\s
- a whitespacehref=
- a fixed string("([^"]*)"|[^\s>] )
- Group 1:"([^"]*)"
- a"
char, then Group 2: any zero or more chars other than"
and then a"
outside of Group 1|
- or[^\s>]
- one or more chars other than whitespace and>
.
See the JavaScript demo:
const myFunction = (str, regex) => {
const result = Array.from(str.matchAll(regex), (x) => x[2] ?? x[1]);
return result.join("\n");
}
const text = `<tr>
<td><a href="dokumenty/file/517-odborne-zkousky-pro-skauty-a-skautky">Odborky pro skauty a skautky</a></td>...<img ...valign=middle><a style="text-decoration: none;" href=//trends.ama.com/trends?utm_source=storyfinder&utm_medium=email&utm_campaign=v1&utm_content=logo><img alt="`;
const regex = /\shref=("([^"]*)"|[^\s>] )/gi;
console.log(myFunction(text, regex));