Tags in Sample.txt:
<ServiceRQ>want everything between...</ServiceRQ>
<ServiceRQ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance>want everything between</ServiceRQ>
..
Please can someone help me to get the regex? To extract the expected output from a text file. I want to create a regex to find the above tags.
This is what is have tried re.search(r"<(.*?)RQ(.*?)>(.*?)</(.*?)RQ>", line)
but not working properly. I want to make a search based on word RQ in text file
The expected output should be
1. <ServiceRQ>want everything between</ServiceRQ>
2. <ServiceRQ> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance>want everything between</ServiceRQ>
CodePudding user response:
Try this pattern
regex= r'<\w RQ.*?>.*?</\w RQ>'
data=re.findall(regex, line)
The above regex will give output like
['<ServiceRQ>want everything between...</ServiceRQ>', '<ServiceRQ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance>want everything between</ServiceRQ>']
CodePudding user response:
As Ashish has mentioned, this one gives the tag including the contents.
regex= r'<\w RQ.*?>.*?</\w RQ>'
data=re.findall(regex, line)
You can also do this to retrieve JUST the contents within the tags. Changing .*?
to (.*?)
between the tags.
regex = r'<\w RQ.*?>(.*?)<\/\w RQ>'
data = re.findall(regex, sample)
This would result in the following output:
['want everything between...', 'want everything between']