I have path strings like these two:
tree/bee.horse_2021/moose/loo.se
bee.horse_2021/moose/loo.se
bee.horse_2021/mo.ose/loo.se
The path can be arbitrarily long after moose
. Sometimes the first part of the path such as tree/
is missing, sometimes not. I want to capture tree
in the first group if it exists and bee.horse
in the second.
I came up with this regex, but it doesn't work:
path_regex = r'^(?:(.*)/)?([a-zA-Z] \.[a-zA-Z] ). $'
What am I missing here?
CodePudding user response:
You can restrict the characters to be matched in the first capture group.
For example, you could match any character except /
or .
using a negated character class [^/\n.]
^(?:([^/\n.] )/)?([a-zA-Z] \.[a-zA-Z] ).*$
Or you can restrict the characters to match word characters \w
only
^(?:(\w )/)?([a-zA-Z] \.[a-zA-Z] ).*$
Note that in your pattern, the .
at the end matches as least a single character. If you want to make that part optional, you can change it to .*
CodePudding user response:
you are missing the escape character on the \
in the regex it should be
path_regex = r'^(?:(.*)\/)?([a-zA-Z] \.[a-zA-Z] ). $'
This should work tested it here and it works https://regex101.com/r/ea9xZE/1/