i am working with regex with python and trying to write regex so that if the url has https then we need to have www3 in url and if http is there then www. my solution is working for https but for http it does not show http. Can anybody help to correct this
st='''
https://www3.yahoo.com
http://www.yahoo.com
'''
p=re.compile(r'(https)?://(?(1)www3|www)\.\w \.\w ')
CodePudding user response:
It would seem the simpest solution is just to write out both alternatives:
st = '''
https://www3.yahoo.com
https://www.yahoo.com
http://www3.yahoo.com
http://www.yahoo.com
'''
p = re.compile(r'http(?:s://www3|://www)\.\w \.\w ')
p.findall(st)
Output:
['https://www3.yahoo.com', 'http://www.yahoo.com']
CodePudding user response:
A normal solution, sample but work
re.findall(r'(http(?P<s>s)?://www(?(s)3|)\..*)', """
https://www3.yahoo.com
http://www.yahoo.com
http://www3.yahoo.com
https://www34.yahoo.com
""")
[('https://www3.yahoo.com', 's'), ('http://www.yahoo.com', '')]
Explain
(?P<s>s)
:(?P<name>)
will give a name for the group.(?(s))
:(?(<id|name>))
will reference the group that match before.(?(s)3|\.)
:(?(<id|name>)yes-pattern|no-pattern)
will choice theyes pattern
if a group matched.
Advice
- group-id
(1)
does not always work, cause you need careful with the group order, and calculate the index of it by yourself, it usually caused an error - group-named
(name)
is a good idea to avoid such the problem.
Reference
CodePudding user response:
For the conditional to work, you have to make only the s
char optional
http(s)?://(?(1)www3|www)\.\w \.\w
Note that using \.\w \.\w
is limited to match an url. This could be a broader match, using \S
to match a non whitspace character.
http(s)?://(?(1)www3|www)\.\S
CodePudding user response:
(https|http)://(www3|www).\w . \w
try this one