Multipile grouping in regex-CodePudding

I have a string

s="<response>blabla  
   <head> blabla 
      <t> EXTRACT 1</t>  
      <t>EXTRACT 2</t>  
   </head>

   <body> blabla   
      <t>BODY 1</t>
      <t>BODY 2</t>
 </response>"

I need to extract the text betwen the tags and but only if its in the head part. I tried

regex="(?:<t>([\w.,_]*)*)</t>

re.findall(regex,s)

but it is fetching the body part too , i understand that i need to tell it to stop at the closing head tag but I couldnt come up with any way

PS:The string is in a single line, I split it for better readability.And i want to do this using regex and not xml parsers.

CodePudding user response：

You can find the header first :

s = "<response>blabla  <head> blabla <t> EXTRACT 1</t>  <t>EXTRACT 2</t>  </head> <body> blabla  <t>BODY 1</t> <t>BODY 2</t> </response>"
pattern_head = "<head>(.*)</head>" 
header = re.findall(pattern_head, s)
print(header)

This gives : [' blabla <t> EXTRACT 1</t> <t>EXTRACT 2</t> '] Then get what you want from the head :

pattern = "<t>(.*?)</t>"
substring = re.findall(pattern,header[0])
print(substring)

>>> [' EXTRACT 1', 'EXTRACT 2']

CodePudding user response：

I got the solution from @oriberu

regex=<t>(\w )</t>(?=.*?</head>)