I'm trying to write a regex that get's urls that have a special word The urls can have different protocols and finish with different characters for example
1 "http://test1.special.com" => should match
2 https://test2.special.com; => should match
3 //test3.special.com; //test3.special.com) => should match twice
4 http://test4.notspecial.com =>no match
I wrote this pattern
\/\/(.*).*?(special.*(\)|\;|\"))
line 3 is selected as only one result instead of two separate matches https://regex101.com/r/FOQgjN/1 How can I do this so it selects one url even if there's multiple per line? Thanks
CodePudding user response:
Granted, it can be more specific, but maybe the following would suit your needs:
(?:https?:)?\/\/.*?(?=["; )]|$)
See an online demo
(?:https?:)?
- Optional non-capture group to match 'http:' or 'https:';\/\/.*?
- Match two forward slash characters and 0 (Lazy) characters other than newline upto;(?=["; )]|$)
- Positive lookahead to assert next character is either in character class '["; )]' or the end-line anchor.
CodePudding user response:
add ?
to all the expressions that match multiple characters. That makes it non-greedy. So like
\/\/(.*?).*?(special.*?(\)|\;|\"))