I have two strings for url and I am building a regexp to take out the domain, project and repo from url.
The urls looks like this:
1. https://bitbucket.org/test/test-x.git
2. ssh://[email protected]/test/test-x
I have this regexp working:
r"(?:ssh|https):\/\/(?:(?:git@|)bitbucket.org)\/([^/]*)\/([^/]*)"
I am able to get project
and slug
group. Now, I also want to strip .git
in regexp if it;s present in slug
group, I tried with adding a non capturing group (?:.git|)
, but no success.
How to omit .git
with regexp if it is present in the string, what I am missing here?
CodePudding user response:
Try:
(?:ssh|https):\/\/(?:git@)?bitbucket\.org\/([^/]*)\/(.*?)(?:\.git|$)
(?:ssh|https)
- match ssh
or https
(?:git@)?
- optionally match git@
([^/]*)
- match first part of path
(.*?)(?:\.git|$)
- match everything untill .git
or end of line
CodePudding user response:
You can omit the outer non capture group (?:
from your pattern because by itself it has no purpose in this context.
Note to escape the dot \.
to match it literally.
If you want to match the slug without a dot you could omit it in the last character class [^/.]*
\b(?:ssh|https):\/\/(?:git@)?bitbucket\.org\/([^/]*)\/([^/.]*)
See a regex demo.
Or you could match any non whitespace character except for /
.
and only match the .
when it is not directly followed by the word git
^(?:ssh|https):\/\/(?:git@)?bitbucket\.org\/([^/\s]*)\/([^/.\s]*(?:\.(?!git\b)[^/.\s]*)*)
Explanation
^
Start of string(?:ssh|https):\/\/
Matchssh
orhttps
then://
(?:git@)?bitbucket\.org\/
([^/\s]*)\/
Capture group 1, match 1 non whitespace chars other than/
and then match the/
(
Capture group 2[^/.\s]*
Optionally match a non whitespace char except for/
and.
(?:
Non capture group (to repeat as a whole part)\.(?!git\b)
Match a dot if not directly followed by the wordgit
[^/.\s]*
Optionally match a non whitespace char except for/
and.
)*
Close the non capture group and optionally repeat it
)
Close group 2
See a regex demo.