I'm using python's re module to grab all instances of values between the opening and closing parenthesis.
i.e. (A)way(Of)testing(This)
would produce a list:
['A', 'Of', 'This']
This is my code:
import re
sentence = "(A)way(Of)testing(This)is running (it)"
res = re.compile(r".*\(([a-zA-Z0-9|^)])\).*", re.S)
for s in re.findall(res, sentence):
print(s)
What I get from this is:
it
Then I realized I was only capturing just one character, so I used
res = re.compile(r".*\(([a-zA-Z0-9-|^)]*)\).*", re.S)
But I still get it
I've always struggled with regex. My understanding of my search string is as follows:
.*
(any character)\(
(escapes the opening parenthesis)(
(starts the grouping)[a-zA-Z0-9-|^)]*
(set of characters allowed : a-Z, A-Z, 0-9, - *EXCEPT the ")" ))
(closes the grouping)\)
(escapes the closing parenthesis).*
(anything else)
So in theory it should go through sentence
and once it encounters a (
,
it should copy the contents up until it encounters a )
, at which point it should
store that into one group. It then proceeds through the sentence
.
I even used the following:
res = re.compile(r".*\(([a-z|A-Z|0-9|-|^)]*)\).*", re.S)
But it still returns an it
.
Any help greatly appreciated,
Thanks
CodePudding user response:
You can shorten the pattern without the .*
and the ^
and )
and only use the character class.
The .*
part matches any character, and as the part between parenthesis is only once in the pattern you will capture only 1 group.
In your explanation about this part [a-zA-Z0-9-|^)]*
the character class does not rule out the )
using |^)
. It will just match either a |
^
or )
char.
If you want to use a negated character class, the ^
should be at the start of the character class like [^
but that is not necessary here as you can specify what do you want to match instead of what you don't want to match.
\(([a-zA-Z0-9-]*)\)
The pattern matches:
\(
Match(
(
Capture group 1[a-zA-Z0-9-]*
Optionally repeat matching one of the listed ranges a-zA-Z0-9 or-
)
Close group 1\)
Match)
You don't need the re.S
as there is no dot in the pattern that should match a newline.
import re
sentence = "(A)way(Of)testing(This)is running (it)"
res = re.compile(r"\(([a-zA-Z0-9-]*)\)")
print(re.findall(res, sentence))
Output
['A', 'Of', 'This', 'it']