I'm quite new to Python and regex. I'm almost there but fail to fix this issue after 6 hours. Hopefully someone can help.
My string is as follows:
str_1 = & peers & & apples & & lemon juice & & Strawberries & & Mellon &
I would like a new list that contains: ['peers','apples','lemon juice','Strawberries','Mellon']
. So without all the whitespace and the &
signs.
My code is as follows:
list_1 = re.compile(r'(?<=&)(.*?)(?=&)').findall(str_1)
However, I get something like this:
list_1 = [' peers ', ' ', ' apples ', ' ', ' lemon juice ', ' ', ' Strawberries ', ' ', ' Mellon']
Can someone please help to get:
['peers','apples','lemon juice','Strawberries','Mellon']
CodePudding user response:
You don't need regexes for this
>>> str_1 = '& peers & & apples & & lemon juice & & Strawberries & & Mellon &'
>>> ls = [x.strip() for x in str_1.split('&')]
>>> ls = [x for x in ls if x]
>>> ls
['peers', 'apples', 'lemon juice', 'Strawberries', 'Mellon']
If you still want a regex, then
>>> re.findall(r'[^& ][^&]*[^& ]', str_1)
['peers', 'apples', 'lemon juice', 'Strawberries', 'Mellon']
CodePudding user response:
If you have to use a regex, you can use
re.findall(r'[^&\s] (?:[^&]*[^&\s])?', str_1)
See the regex demo. Details:
[^&\s]
- one or more chars other than&
and whitespace -(?:[^&]*[^&\s])?
- an optional sequence of any chars other than&
and then a char other than a&
or whitespace.
See the Python demo:
import re
str_1 = "& peers & & apples & & lemon juice & & Strawberries & & Mellon & "
print( re.findall(r'[^&\s] (?:[^&]*[^&\s])?', str_1) )
# => ['peers', 'apples', 'lemon juice', 'Strawberries', 'Mellon']
A non-regex solution can look like
[x.strip() for x in str_1.split('&') if x.strip()]
See this Python demo. Here, you split a string with &
chars and only keep the items that are not empty or are all whitespace with leading/trailing spaces stripped.