Home > Mobile >  Regex is failing for few URLs
Regex is failing for few URLs

Time:09-23

I am having a Regex as /url\s*\((?!['("]?(?:data):)['"]?([^')"\)]*)['"]?([\)]|$)/gi where we are using it to parse style tag to get URL(ex.backgroundImage). It fails for URL such as

1] background-image: url(\2f content\2f dam\2f dx\2fus\2f en\2f error-pages\2f 404-1440x612_edge2.jpg\2fjcr:content\2frenditions\2f cq5dam.tablet_1400.1400.595.jpg); background-position: 50% 50%;

should be => url(\2f content\2f dam\2f dx\2fus\2f en\2f error-pages\2f 404-1440x612_edge2.jpg\2fjcr:content\2frenditions\2f cq5dam.tablet_1400.1400.595.jpg)

2] background-image: url("https://www.investopedia.com/thmb/m3EwtlYfbUhlr9e34AofFj9wok8=/1300x0/filters:contrast(10):brightness(-10):no_upscale()/TopTerms-2bdc464d466944deb41fc07379407600.jpeg")

should be => url("https://www.investopedia.com/thmb/m3EwtlYfbUhlr9e34AofFj9wok8=/1300x0/filters:contrast(10):brightness(-10):no_upscale()/TopTerms-2bdc464d466944deb41fc07379407600.jpeg")

It fails at contrast(10): as it consider contrast(10) closing bracket as end of URL.

3] background-image:url('https://cdn.comcast.com/-/media/Images/www_xfinity_com/TV/X1/09072021 Refresh/10X1HeroDesktop.png?rev=d04c61c0-3658-457d-8260-74ef6694c0ed&mw=1280&mh=600&hash=6A1C4FEC8499EE38864BA31D24B9E42220D8C7EB')" background-size: cover;

should be => url('https://cdn.comcast.com/-/media/Images/www_xfinity_com/TV/X1/09072021 Refresh/10X1HeroDesktop.png?rev=d04c61c0-3658-457d-8260-74ef6694c0ed&mw=1280&mh=600&hash=6A1C4FEC8499EE38864BA31D24B9E42220D8C7EB')

4] style="position:absolute; background:transparent url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII=) repeat 0 0"

should be => url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII=)

CodePudding user response:

You could use an alternation | to get the different formats.

url\s*\((?:data:image\S |(['"]).*?\1|[^()]*)\)

In parts, the pattern matches:

  • url\s*\( Match url, optional whitespace chars an (
  • (?: Non capture group for the alternation
    • data:image\S Match data:image and 1 non whiteapace chars
    • | Or
    • (['"]).*?\1 Match from opening quote to closing quote
    • | Or
    • [^()]* Match 0 times any char except parenthesis
  • ) Close non capture group
  • \) Match )

Regex demo

  • Related