With python I want to use regular expression(or any other way) to find all strings between
£ and £
$ and $
[ and ]
sample_text = "All £blocker£ §YTile Blockers§! have been cleared from $country$ $status$ [From.From.GetName]."
With this sample text I want to get output of
blocker
country
status
From.From.GetName
CodePudding user response:
Use re.findall
:
# -*- coding: utf-8 -*-
import re
sample_text = "All £blocker£ §YTile Blockers§! have been cleared from $country$ $status$ [From.From.GetName]."
matches = re.findall(r'[£§$\[](.*?)[£§$\]]', sample_text)
print(matches) # ['blocker', 'YTile Blockers', 'country', 'status', 'From.From.GetName']
The regex pattern used here says to match:
[£§$\[]
match an opening marker(.*?)
match and capture in\1
the content[£§$\]]
then match a closing marker
CodePudding user response:
([£§$])(\S*)\1
can exclude spaces.
CodePudding user response:
Here is a solution without regular expression. It requires to implement a simple parsing function, apply it to the tokenized text and finally filter the results to exclude the None values.
def parse(word):
symbols = ['£', '$', '[]']
for s in symbols:
_len = len(s)
if _len == 1:
start = word.find(s)
end = word[start 1:].find(s)
if start != -1 and end != -1:
return word[start 1:start end 1]
elif _len == 2:
start = word.find(s[0])
end = word[start 1:].find(s[-1])
if start != -1 and end != -1:
return word[start 1:start end 1]
sample_text = "All £blocker£ §YTile Blockers§! have been cleared from $country$ $status$ [From.From.GetName]."
result = list(filter(None, map(parse, sample_text.split(' '))))
print(result) #['blocker', 'country', 'status', 'From.From.GetName']