Home > Back-end >  Regex capture between certain characters
Regex capture between certain characters

Time:10-04

I'm quite new to Python and regex. I'm almost there but fail to fix this issue after 6 hours. Hopefully someone can help.

My string is as follows:

str_1 =  & peers & & apples & & lemon juice & & Strawberries & & Mellon & 

I would like a new list that contains: ['peers','apples','lemon juice','Strawberries','Mellon']. So without all the whitespace and the & signs.

My code is as follows:

list_1 = re.compile(r'(?<=&)(.*?)(?=&)').findall(str_1)

However, I get something like this:

list_1 =  [' peers ', ' ', ' apples ', ' ', ' lemon juice ', ' ', ' Strawberries ', ' ', ' Mellon']

Can someone please help to get:

['peers','apples','lemon juice','Strawberries','Mellon']

CodePudding user response:

You don't need regexes for this

>>> str_1 =  '& peers & & apples & & lemon juice & & Strawberries & & Mellon &'
>>> ls = [x.strip() for x in str_1.split('&')]
>>> ls = [x for x in ls if x]
>>> ls
['peers', 'apples', 'lemon juice', 'Strawberries', 'Mellon']

If you still want a regex, then

>>> re.findall(r'[^& ][^&]*[^& ]', str_1)
['peers', 'apples', 'lemon juice', 'Strawberries', 'Mellon']

CodePudding user response:

If you have to use a regex, you can use

re.findall(r'[^&\s] (?:[^&]*[^&\s])?', str_1)

See the regex demo. Details:

  • [^&\s] - one or more chars other than & and whitespace -(?:[^&]*[^&\s])? - an optional sequence of any chars other than & and then a char other than a & or whitespace.

See the Python demo:

import re
str_1 = "& peers & & apples & & lemon juice & & Strawberries & & Mellon & "
print( re.findall(r'[^&\s] (?:[^&]*[^&\s])?', str_1) )
# => ['peers', 'apples', 'lemon juice', 'Strawberries', 'Mellon']

A non-regex solution can look like

[x.strip() for x in str_1.split('&') if x.strip()]

See this Python demo. Here, you split a string with & chars and only keep the items that are not empty or are all whitespace with leading/trailing spaces stripped.

  • Related