I'm new to python and I have this string:
row = <aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>
I need to get all data that is in aa :
hello,great,later
my code is:
allAA =[]
patternAA = "<aa>(.*)</aa>"
allAA = '\'' (re.search(patternAA, str(row))).groups() '\','
and I get this result = <aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa>
How can I get the data I need?
CodePudding user response:
You can use a .findall() method that lists all matches for your regex expression
import re
row = "<aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>"
allAA = re.findall(r'<aa>(.*?)</aa>', row)
print(allAA) # ['hello', 'great', 'later']
CodePudding user response:
There are two issues with your code:
- You need to use a non-greedy capture group, specified by using
?
. - You should use
re.findall()
to get the captured, groups, rather thanre.search()
.
With these two fixes, we get the following:
import re
row = "<aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>"
patternAA = re.compile(r"<aa>(.*?)</aa>")
result = re.findall(patternAA, row)
# Prints ['hello', 'great', 'later']
print(result)
CodePudding user response:
I'm not sure if you want to specifically use re
, but here is a solution that works if you are not too worried about efficiency:
def solution(row, separator1, separator2):
output = ""
while row.find(separator1) != -1 and row.find(separator2) != -1:
ind1 = row.index(separator1)
ind2 = row.index(separator2)
output = row[ind1 len(separator1): ind2] ","
row = row[ind2 len(separator2):]
return(output[:-1])
row = "<aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>"
print(solution(row,"<aa>","</aa"))
Output:
hello,great,later
While both "<aa>"
and "</aa>"
exist in the String, we can find the indexes of both, concatenate the String in between into an output
, remove the substring from "<aa>"
to "</aa>"
in row
, and continue.
I hope this helped! Please let me know if you need any further clarifications or details :)