I`m trying to extract the src URL/path without the quotes, only in the case it is an image:
- src="/path/image.png" // should capture => /path/image.png
- src="/path/image.bmp" // should capture => /path/image.bmp
- src="/path/image.jpg" // should capture => /path/image.jpg
- src="https://www.site1.com" // should NOT capture
So far I have /src="(.*)"/g
, but that obviously captures both, I have been looking at look behind and look ahead but just can`t put it together.
CodePudding user response:
You can use a capture group, and you should prevent crossing the "
using a negated character class.
If you want to match either href or src
\b(?:href|src)="([^\s"]*\.(?:png|jpg|bmp))"
Explanation
\b
A word boundary to prevent a partial word match(?:href|src)="
match eitherhref=
orsrc=
(
Capture group 1[^\s"]*
Match optional chars other than a whitespace char or"
\.(?:png|jpg|bmp)
Match one of.png
.jpg
.bmp
)
Close group 1"
Match literally
const regex = /\b(?:href|src)="([^\s"]*\.(?:png|jpg|bmp))"/;
[
'src="/path/image.png" test "',
'src="/path/image.bmp"',
'src="/path/image.jpg"',
'src="https://www.site1.com"',
'href="image.png"'
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
})
CodePudding user response:
Try /src="(.*[jpg|bmp|png])"/g
You'll need to enter in the list of extensions you consider valid images
CodePudding user response:
If you want it to be a bit more fool proof you can use look behinds and look aheads. Expand the extension list png|bmp|jpg
to test for more extensions.
/(?<=src=").*(png|bmp|jpg)(?=")/g
CodePudding user response:
Try this src="(.*image.*)"