Home > Mobile >  Regular expression to find all the image urls in a string
Regular expression to find all the image urls in a string

Time:01-02

I am trying to construct a regular expression that finds all image urls from a string. An image url can be either absolute path or relative.

All these should be valid matches:

 ../example/test.png
   
 https://www.test.com/abc.jpg
   
 images/test.webp

For example: if we define

inputString="img src=https://www.test.com/abc.jpg background:../example/test.png <div> images/test.webp image.pnghello"

then we should find these 3 matches:

https://www.test.com/abc.jpg
../example/test.png
images/test.webp

I am currently doing this(i am using python) and it only finds absolute path, find only some of the images and also sometimes has bad matches(finds a string that has an image url inside but adds to it a lot of stuff that is after the image url)

imageurls = re.findall(r'(?:"|\')((?:https?://|/)\S \.(?:jpg|png|gif|jpeg|webp))(?:"|\')', inputString)

CodePudding user response:

You can try:

(?i)https?\S (?:jpg|png|webp)\b|[^:<>\s\'\"] (?:jpg|png|webp)\b

Regex demo.


import re

s = '''img src=https://www.test.com/abc.jpg background:../example/test.png <div> images/test.webp image.pnghellobackground-image: url('../images/pics/mobile/img.JPG')'''
pat = re.compile(r'(?i)https?\S (?:jpg|png|webp)\b|[^:<>\s\'\"] (?:jpg|png|webp)\b')

for m in pat.findall(s):
    print(m)

Prints:

https://www.test.com/abc.jpg
../example/test.png
images/test.webp
../images/pics/mobile/img.JPG

CodePudding user response:

What do you think of that :

re.findall(r'(?=:[^\S])?(?:https?://)?[\./]*[\w/\.] \.(?:jpg|png|gif|jpeg|webp)', inputString)

With:

"img src=http://another.org/hola.gif https://www.test.com/abc.jpg background:../example/test.png <div> images/test.webp image.pnghello"

Gives :

 ['http://another.org/hola.gif',
 'https://www.test.com/abc.jpg',
 '../example/test.png',
 'images/test.webp',
 'image.png']

This probably needs more test samples :)

  • Related