import re
s="facebook.com/https://www.facebook.com/test/"
re.findall("facebook\.com/[^?\"\'>&\*\n\r\\\ <] ?", s)
I only want as a result "facebook.com/test/" ... but I'm getting as a result --
facebook.com/h
facebook.com/t
What's wrong with my RE? I applied the "?" at the end of the expression thinking this would stop greediness, but it's being treated as 0 or 1 expression.
If I remove the "?" I get:
facebook.com/https://www.facebook.com/test/
CodePudding user response:
The non-greedy modifier works forwards but not backwards, which means that when the first instance of facebook.com/
matches it will not be discarded unless the rest of the pattern fails to match, even if it's non-greedy.
To match the last instance of facebook.com/
you can use a negative lookahead pattern instead:
facebook\.com/(?!.*facebook\.com/)[^?\"\'>&\*\n\r\\\ <]