Home > database >  Need Help extracing a list from a string using regex
Need Help extracing a list from a string using regex

Time:07-11

I have a little trouble extracting these two lists '['item1', 'item2', 'item3'] ['item4', 'item5']' here is an example of the code I have attempted

import re

pattern_matcher = re.compile(r"(\[,\]])")

alien_string = "Offering ['item1', 'item2', 'item3'] (4331)[6785]    Requesting ['item4', 'item5'] (6998)[6766]. {0.255457} (0 left in queue)..."

matches = pattern_matcher.fullmatch(alien_string)

print(matches)

The output I received was

Output: None

I would like to know how I can go about extracting these two lists from this long string specifically ['item1', 'item2', 'item3'] and ['item4', 'item5']

CodePudding user response:

import re

pattern = re.compile(r"\[[^\]] ?(?:, [^\]] ?) \]")

alien_string = "Offering ['item1', 'item2', 'item3'] (4331)[6785]    Requesting ['item4', 'item5'] (6998)[6766]. {0.255457} (0 left in queue)..."

matches = re.findall(pattern, alien_string)

for match in matches:
    print(match)

Result:

['item1', 'item2', 'item3']
['item4', 'item5']

Explanation:

  • \[ matches the first bracket
  • [^\]] matches anything that isnt a closed bracket
  • [^\]]*? matches a variable number of non-bracket chars until the comma
  • (?: starts a non-capturing group (used for findall)

The rest should be self-explanatory.

CodePudding user response:

You could try to split the line and match each part:

import re
alien_string = "Offering ['item1', 'item2', 'item3'] (4331)[6785]    Requesting ['item4', 'item5'] (6998)[6766]. {0.255457} (0 left in queue)..."
parts = alien_string.split('    ')
lists = [re.search(r'(\[(.*?)\])', part).group(0) for part in parts]
print(lists)

output

["['item1', 'item2', 'item3']", "['item4', 'item5']"]
  • Related