Home > Software design >  How to strip out the suffix from url in regexp?
How to strip out the suffix from url in regexp?

Time:11-15

I have two strings for url and I am building a regexp to take out the domain, project and repo from url.

The urls looks like this:

1. https://bitbucket.org/test/test-x.git

2. ssh://[email protected]/test/test-x

I have this regexp working:

r"(?:ssh|https):\/\/(?:(?:git@|)bitbucket.org)\/([^/]*)\/([^/]*)"

I am able to get project and slug group. Now, I also want to strip .git in regexp if it;s present in slug group, I tried with adding a non capturing group (?:.git|), but no success.

How to omit .git with regexp if it is present in the string, what I am missing here?

CodePudding user response:

Try:

(?:ssh|https):\/\/(?:git@)?bitbucket\.org\/([^/]*)\/(.*?)(?:\.git|$)

Regex demo.


(?:ssh|https) - match ssh or https

(?:git@)? - optionally match git@

([^/]*) - match first part of path

(.*?)(?:\.git|$) - match everything untill .git or end of line

CodePudding user response:

You can omit the outer non capture group (?: from your pattern because by itself it has no purpose in this context.

Note to escape the dot \. to match it literally.

If you want to match the slug without a dot you could omit it in the last character class [^/.]*

\b(?:ssh|https):\/\/(?:git@)?bitbucket\.org\/([^/]*)\/([^/.]*)

See a regex demo.

Or you could match any non whitespace character except for / . and only match the . when it is not directly followed by the word git

^(?:ssh|https):\/\/(?:git@)?bitbucket\.org\/([^/\s]*)\/([^/.\s]*(?:\.(?!git\b)[^/.\s]*)*)

Explanation

  • ^ Start of string
  • (?:ssh|https):\/\/ Match ssh or https then ://
  • (?:git@)?bitbucket\.org\/
  • ([^/\s]*)\/ Capture group 1, match 1 non whitespace chars other than / and then match the /
  • ( Capture group 2
    • [^/.\s]* Optionally match a non whitespace char except for / and .
    • (?: Non capture group (to repeat as a whole part)
      • \.(?!git\b) Match a dot if not directly followed by the word git
      • [^/.\s]* Optionally match a non whitespace char except for / and .
    • )* Close the non capture group and optionally repeat it
  • ) Close group 2

See a regex demo.

  • Related