Home > database >  How to get specific data from a string using pattern
How to get specific data from a string using pattern

Time:03-01

I'm new to python and I have this string:

  row =  <aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>

I need to get all data that is in aa :

hello,great,later

my code is:

 allAA  =[]
 patternAA = "<aa>(.*)</aa>"
 allAA = '\'' (re.search(patternAA, str(row))).groups()  '\','

and I get this result = <aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa> How can I get the data I need?

CodePudding user response:

You can use a .findall() method that lists all matches for your regex expression

import re

row =  "<aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>"
allAA = re.findall(r'<aa>(.*?)</aa>', row)

print(allAA) # ['hello', 'great', 'later']

CodePudding user response:

There are two issues with your code:

  1. You need to use a non-greedy capture group, specified by using ?.
  2. You should use re.findall() to get the captured, groups, rather than re.search().

With these two fixes, we get the following:

import re
row =  "<aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>"
patternAA = re.compile(r"<aa>(.*?)</aa>")
result = re.findall(patternAA, row)

# Prints ['hello', 'great', 'later']
print(result)

CodePudding user response:

I'm not sure if you want to specifically use re, but here is a solution that works if you are not too worried about efficiency:

def solution(row, separator1, separator2):
  output = ""
  while row.find(separator1) != -1 and row.find(separator2) != -1:
    ind1 = row.index(separator1)
    ind2 = row.index(separator2)
    output  = row[ind1   len(separator1): ind2]   ","
    row = row[ind2   len(separator2):]
  return(output[:-1])

row = "<aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>"
print(solution(row,"<aa>","</aa"))

Output:

hello,great,later

While both "<aa>" and "</aa>" exist in the String, we can find the indexes of both, concatenate the String in between into an output, remove the substring from "<aa>" to "</aa>" in row, and continue.

I hope this helped! Please let me know if you need any further clarifications or details :)

  • Related