I need to extract images from string like below: l
let str = 'any words abc sdf sdf http://test/files/1/1574322005738295297/2222681.jpeghttp://test/files/1/2040655876098/Image131607.png';
I need to get the img urls like:
const expectedImgList = ['http://test/files/1/1574322005738295297/2222681.jpeg','http://test/files/1/2040655876098/Image131607.png'];
How can i achieve that with javascript Regexp?
CodePudding user response:
I have made the assumptions that
a. all files are accessed using http://, not https://
b. the concatenation of two images without a space between them in your example was deliberate.
c all image paths are followed by another image path or a space char.
Based on these I offer the following code that uses a modified version of your sample str:
let str = 'any words abc sdf http://test/y.jpg sdf http://test/2222681.jpeghttp://test/Image131607.png http://test/x.jpg';
let link="";
let regex = /\bhttp:\/\/[^ ]*/g;
let found = str.match(regex);
let result = [];
let r = 0,f1,f=0;
while (f<found.length) {
let found_inner = found[f].split('http://');
result[r ] = "http://" found_inner[1];
fi=2;
while (fi<found_inner.length) {
link = "http://" found_inner[fi];
result.splice(r , 0,link);
fi ;
}
f ;
}
console.log(result);
CodePudding user response:
It looks like you have multiple URLs concatenated without space. Here is a solution that assumes that a split should happen between a word character (which is the last char of the previous URL, and the beginning of the next URL http://
:
const str = 'any words abc sdf sdf http://test/files/1/1574322005738295297/2222681.jpeghttp://test/files/1/2040655876098/Image131607.png';
const regex1 = /(\w)(?=https?:\/\/)/g;
const regex2 = /[ "']/;
const regex3 = /^https?:\/\//;
let result = str
.replace(regex1, '$1 ') // add space between URLs
.split(regex2) // split on space and quotes (in case of URL in HTML)
.filter(s => regex3.test(s)); // keep only URLs
console.log(result);
Output:
[
"http://test/files/1/1574322005738295297/2222681.jpeg",
"http://test/files/1/2040655876098/Image131607.png"
]
Explanation of regex1:
(\w)
-- capture group 1: one word char(?=https?:\/\/)
-- positive lookahead forhttp://
orhttps://
g
flag: gloabl, e.g. replace multiple times
Explanation of regex2:
[ "']
-- one of"
, or'
Explanation of regex3:
^https?:\/\/
-- literalhttp://
orhttps://
at the begiining of the string