I have text like:
"abababba"
I want to extract the characters as a list between a
.
For the above text, I am expecting output like:
['b', 'b', 'bb']
I have used:
re.split(r'^a(.*?)a$', data)
But it doesn't work.
CodePudding user response:
You could use re.findall to return the capture group values with the pattern:
a([^\sa] )(?=a)
a
Match ana
char([^\sa] )
Capture group 1, repeat matching any char excepta
(or a whitspace char if you don't want to match spaces)(?=a)
Positive lookahead, asserta
to the right
import re
pattern = r"a([^\sa] )(?=a)"
s = "abababba"
print(re.findall(pattern, s))
Output
['b', 'b', 'bb']
CodePudding user response:
You could use a list comprehension to achieve this:
s = "abababba"
l = [x for x in s.split("a") if not x == ""]
print(l)
Output:
['b', 'b', 'bb']
CodePudding user response:
The ^ and $ will only match the beginning and end of a line, respectively. In this case, you will get the desired list by using the line:
re.split(r'a(.*?)a', data)[1:-1]
CodePudding user response:
Why not use a normal split:
"abababba".split("a") --> ['', 'b', 'b', 'bb', '']
And remove the empty parts as needed:
# remove all empties:
[*filter(None,"abababba".split("a"))] -> ['b', 'b', 'bb']
or
# only leading/trailing empties (if any)
"abababba".strip("a").split("a") --> ['b', 'b', 'bb']
or
# only leading/trailing empties (assuming always enclosed in 'a')
"abababba".split("a")[1:-1] --> ['b', 'b', 'bb']
If you must use a regular expression, perhaps findall() will let you use a simpler pattern while covering all edge cases (ignoring all empties):
re.findall(r"[^a] ","abababba") --> ['b', 'b', 'bb']
re.findall(r"[^a] ","abababb") --> ['b', 'b', 'bb']
re.findall(r"[^a] ","bababb") --> ['b', 'b', 'bb']
re.findall(r"[^a] ","babaabb") --> ['b', 'b', 'bb']