I am having a Regex as /url\s*\((?!['("]?(?:data):)['"]?([^')"\)]*)['"]?([\)]|$)/gi
where we are using it to parse style tag to get URL(ex.backgroundImage). It fails for URL such as
1] background-image: url(\2f content\2f dam\2f dx\2fus\2f en\2f error-pages\2f 404-1440x612_edge2.jpg\2fjcr:content\2frenditions\2f cq5dam.tablet_1400.1400.595.jpg); background-position: 50% 50%;
should be => url(\2f content\2f dam\2f dx\2fus\2f en\2f error-pages\2f 404-1440x612_edge2.jpg\2fjcr:content\2frenditions\2f cq5dam.tablet_1400.1400.595.jpg)
2] background-image: url("https://www.investopedia.com/thmb/m3EwtlYfbUhlr9e34AofFj9wok8=/1300x0/filters:contrast(10):brightness(-10):no_upscale()/TopTerms-2bdc464d466944deb41fc07379407600.jpeg")
should be => url("https://www.investopedia.com/thmb/m3EwtlYfbUhlr9e34AofFj9wok8=/1300x0/filters:contrast(10):brightness(-10):no_upscale()/TopTerms-2bdc464d466944deb41fc07379407600.jpeg")
It fails at contrast(10): as it consider contrast(10) closing bracket as end of URL.
3] background-image:url('https://cdn.comcast.com/-/media/Images/www_xfinity_com/TV/X1/09072021 Refresh/10X1HeroDesktop.png?rev=d04c61c0-3658-457d-8260-74ef6694c0ed&mw=1280&mh=600&hash=6A1C4FEC8499EE38864BA31D24B9E42220D8C7EB')" background-size: cover;
should be => url('https://cdn.comcast.com/-/media/Images/www_xfinity_com/TV/X1/09072021 Refresh/10X1HeroDesktop.png?rev=d04c61c0-3658-457d-8260-74ef6694c0ed&mw=1280&mh=600&hash=6A1C4FEC8499EE38864BA31D24B9E42220D8C7EB')
4] style="position:absolute; background:transparent url() repeat 0 0"
should be => url()
CodePudding user response:
You could use an alternation |
to get the different formats.
url\s*\((?:data:image\S |(['"]).*?\1|[^()]*)\)
In parts, the pattern matches:
url\s*\(
Match url, optional whitespace chars an(
(?:
Non capture group for the alternationdata:image\S
Match data:image and 1 non whiteapace chars|
Or(['"]).*?\1
Match from opening quote to closing quote|
Or[^()]*
Match 0 times any char except parenthesis
)
Close non capture group\)
Match)