I am trying to find all substrings within a multi string in python 3, I want to find all words in between the word 'Colour:':
example string:
str = """
Colour: Black
Colour: Green
Colour: Black
Colour: Red
Colour: Orange
Colour: Blue
Colour: Green
"""
I want to get all of the colours into a list like:
x = ['Black', 'Green', 'Black', 'Red', 'Orange', 'Blue', 'Green']
I want to do this using Python re
Whats the fastest way of doing this with re.search , re.findall, re.finditer or even another method.
I've tried doing this as a list comprehension:
z = [x.group() for x in re.finditer('Colour:(.*?)Colour:', str)]
but it returns an empty list ?
any ideas?
CodePudding user response:
In regex, the dot .
does not match new line by default. This mean your program is trying to find something like "Color: blueColor".
To overcome this, you can just do something like :
colours = re.findall(r'Colour: (. )', str)
Note the use of re.findall
to avoid using the list comprehension.
Furthermore, if the format won't change, regex is not mandatory and you can just split each line on spaces and get the second part :
colours = [line.split()[1] for line in str.splitlines()]
CodePudding user response:
The lists containing the trailing spaces can be removed and split based on the user-defined variable. In your case, the Colour:
.
list(filter(None, str.replace("\n", "").replace(" ", "").split("Colour:")))
Result:
['Black', 'Green', 'Black', 'Red', 'Orange', 'Blue', 'Green']
Regard to time constraints:
Regex patterns are subjected to taking more time than dealing with strings directly.
Adding the image for reference:
CodePudding user response:
Perhaps you just need a simple one-liner:
x = re.findall("Colour: (.*)",str)
This worked for your example.
(P.S. please don't use builtin symbols like str
for variable names.)