I have the following path stored as a python string 'C:\ABC\DEF\GHI\App\Module\feature\src'
and I would like to extract the word Module
that is located between words \App\
and \feature\
in the path name. Note that there are file separators '\'
in between which ought not to be extracted, but only the string Module
has to be extracted.
I had the few ideas on how to do it:
- Write a RegEx that matches a string between
\App\
and\feature\
- Write a RegEx that matches a string after
\App\
-->App\\[A-Za-z0-9]*\\
, and then split that matched string in order to find theModule
.
I think the 1st solution is better, but that unfortunately it goes over my RegEx knowledge and I am not sure how to do it.
I would much appreciate any help.
Thank you in advance!
CodePudding user response:
Your are looking for groups. With some small modificatians you can extract only the part between App and Feature.
(?:App\\\\)([A-Za-z0-9]*)(?:\\\\feature)
The brackets (
)
define a Match group which you can get by match.group(1)
. Using (?:foo)
defines a non-matching group, e.g. one that is not included in your result. Try the expression here: https://regex101.com/r/24mkLO/1
CodePudding user response:
We can do that by str.find
somethings like
str = 'C:\\ABC\\DEF\\GHI\\App\\Module\\feature\\src'
import re
start = '\\App\\'
end = '\\feature\\'
print( (str[str.find(start) len(start):str.rfind(end)]))
print("\n")
output
Module
CodePudding user response:
The regex you want is:
(?<=\\App\\).*?(?=\\feature\\)
Explanation of the regex:
(?<=behind)rest
matches all instances ofrest
if there isbehind
immediately before it. It's called a positive lookbehindrest(?=ahead)
matches all instances ofrest
where there isahead
immediately after it. This is a positive lookahead.\
is a reserved character in regex patterns, so to use them as part of the pattern itself, we have to escape it; hence,\\
.*
matches any character, zero or more times.?
specifies that the match is not greedy (so we are implicitly assuming here that\feature\
only shows up once after\App\
).- The pattern in general also assumes that there are no
\
characters between\App\
and\feature\
.
The full code would be something like:
str = 'C:\\ABC\\DEF\\GHI\\App\\Module\\feature\\src'
start = '\\App\\'
end = '\\feature\\'
pattern = rf"(?<=\{start}\).*?(?=\{end}\)"
print(pattern) # (?<=\\App\\).*?(?=\\feature\\)
print(re.search(pattern, str)[0]) # Module
A link on regex lookarounds that may be helpful: https://www.regular-expressions.info/lookaround.html