How to extract the value between the key using RegEx?-CodePudding

I have text like:

"abababba"

I want to extract the characters as a list between a. For the above text, I am expecting output like:

['b', 'b', 'bb']

I have used:

re.split(r'^a(.*?)a$', data)

But it doesn't work.

CodePudding user response：

You could use re.findall to return the capture group values with the pattern:

a([^\sa] )(?=a)

a Match an a char
([^\sa] ) Capture group 1, repeat matching any char except a (or a whitspace char if you don't want to match spaces)
(?=a) Positive lookahead, assert a to the right

Regex demo

import re

pattern = r"a([^\sa] )(?=a)"
s = "abababba"

print(re.findall(pattern, s))

Output

['b', 'b', 'bb']

CodePudding user response：

You could use a list comprehension to achieve this:

s = "abababba"
l = [x for x in s.split("a") if not x == ""]
print(l)

Output:

['b', 'b', 'bb']

CodePudding user response：

The ^ and $ will only match the beginning and end of a line, respectively. In this case, you will get the desired list by using the line:

re.split(r'a(.*?)a', data)[1:-1]

CodePudding user response：

Why not use a normal split:

"abababba".split("a") --> ['', 'b', 'b', 'bb', '']

And remove the empty parts as needed:

# remove all empties:

[*filter(None,"abababba".split("a"))] -> ['b', 'b', 'bb']

# only leading/trailing empties (if any)

"abababba".strip("a").split("a") --> ['b', 'b', 'bb']

# only leading/trailing empties (assuming always enclosed in 'a')

"abababba".split("a")[1:-1]  --> ['b', 'b', 'bb']

If you must use a regular expression, perhaps findall() will let you use a simpler pattern while covering all edge cases (ignoring all empties):

re.findall(r"[^a] ","abababba") --> ['b', 'b', 'bb']
re.findall(r"[^a] ","abababb")  --> ['b', 'b', 'bb']
re.findall(r"[^a] ","bababb")   --> ['b', 'b', 'bb']
re.findall(r"[^a] ","babaabb")  --> ['b', 'b', 'bb']