using re to grab all instances of values between parenthesis-CodePudding

I'm using python's re module to grab all instances of values between the opening and closing parenthesis.

i.e.  (A)way(Of)testing(This)

would produce a list:

   ['A', 'Of', 'This']

I took a look at 1 and 2.

This is my code:

import re


sentence = "(A)way(Of)testing(This)is running (it)"

res = re.compile(r".*\(([a-zA-Z0-9|^)])\).*", re.S)
for s in re.findall(res, sentence):
    print(s)

What I get from this is:

it

Then I realized I was only capturing just one character, so I used

res = re.compile(r".*\(([a-zA-Z0-9-|^)]*)\).*", re.S)

But I still get it

I've always struggled with regex. My understanding of my search string is as follows:

.* (any character)
\( (escapes the opening parenthesis)
( (starts the grouping)
[a-zA-Z0-9-|^)]* (set of characters allowed : a-Z, A-Z, 0-9, - *EXCEPT the ")" )
) (closes the grouping)
\) (escapes the closing parenthesis)
.* (anything else)

So in theory it should go through sentence and once it encounters a (, it should copy the contents up until it encounters a ), at which point it should store that into one group. It then proceeds through the sentence.

I even used the following:

  res = re.compile(r".*\(([a-z|A-Z|0-9|-|^)]*)\).*", re.S)

But it still returns an it.

Any help greatly appreciated,

Thanks

CodePudding user response：

You can shorten the pattern without the .* and the ^ and ) and only use the character class.

The .* part matches any character, and as the part between parenthesis is only once in the pattern you will capture only 1 group.

In your explanation about this part [a-zA-Z0-9-|^)]* the character class does not rule out the ) using |^). It will just match either a | ^ or ) char.

If you want to use a negated character class, the ^ should be at the start of the character class like [^ but that is not necessary here as you can specify what do you want to match instead of what you don't want to match.

\(([a-zA-Z0-9-]*)\)

The pattern matches:

\( Match (
( Capture group 1
- [a-zA-Z0-9-]* Optionally repeat matching one of the listed ranges a-zA-Z0-9 or -
) Close group 1
\) Match )

regex demo

You don't need the re.S as there is no dot in the pattern that should match a newline.

import re

sentence = "(A)way(Of)testing(This)is running (it)"
res = re.compile(r"\(([a-zA-Z0-9-]*)\)")
print(re.findall(res, sentence))

Output

['A', 'Of', 'This', 'it']