Home > Software engineering >  how to extract the images from string using javascript regexp?
how to extract the images from string using javascript regexp?

Time:12-14

I need to extract images from string like below: l

let str = 'any words abc sdf sdf http://test/files/1/1574322005738295297/2222681.jpeghttp://test/files/1/2040655876098/Image131607.png';
I need to get the img urls like:

const expectedImgList = ['http://test/files/1/1574322005738295297/2222681.jpeg','http://test/files/1/2040655876098/Image131607.png'];

How can i achieve that with javascript Regexp?

CodePudding user response:

I have made the assumptions that

a. all files are accessed using http://, not https://

b. the concatenation of two images without a space between them in your example was deliberate.

c all image paths are followed by another image path or a space char.

Based on these I offer the following code that uses a modified version of your sample str:

let str = 'any words abc sdf http://test/y.jpg sdf http://test/2222681.jpeghttp://test/Image131607.png http://test/x.jpg';
let link="";
let regex = /\bhttp:\/\/[^ ]*/g;
let found = str.match(regex);
let result = [];
let r = 0,f1,f=0;
while (f<found.length) {
    let found_inner = found[f].split('http://');
    result[r  ] = "http://" found_inner[1];
    fi=2;
    while (fi<found_inner.length) {
        link = "http://" found_inner[fi];
        result.splice(r  , 0,link);
        fi  ;
    }
    f  ;
}
console.log(result);

CodePudding user response:

It looks like you have multiple URLs concatenated without space. Here is a solution that assumes that a split should happen between a word character (which is the last char of the previous URL, and the beginning of the next URL http://:

const str = 'any words abc sdf sdf http://test/files/1/1574322005738295297/2222681.jpeghttp://test/files/1/2040655876098/Image131607.png';
const regex1 = /(\w)(?=https?:\/\/)/g;
const regex2 = /[ "']/;
const regex3 = /^https?:\/\//;
let result = str
  .replace(regex1, '$1 ') // add space between URLs
  .split(regex2)          // split on space and quotes (in case of URL in HTML)
  .filter(s => regex3.test(s)); // keep only URLs
console.log(result);

Output:

[
  "http://test/files/1/1574322005738295297/2222681.jpeg",
  "http://test/files/1/2040655876098/Image131607.png"
]

Explanation of regex1:

  • (\w) -- capture group 1: one word char
  • (?=https?:\/\/) -- positive lookahead for http:// or https://
  • g flag: gloabl, e.g. replace multiple times

Explanation of regex2:

  • [ "'] -- one of , ", or '

Explanation of regex3:

  • ^https?:\/\/ -- literal http:// or https:// at the begiining of the string
  • Related