I have a string
"<li style="-moz-float-edge: content-box">... that in <i><b><a href="/wiki/Laßt_uns_sorgen,_laßt_uns_wachen,_BWV_213" title="Lat uns sorgen, lat uns wachen, BWV 213">Die Wahl des Herkules</a></b></i>, Hercules must choose between the good cop and the bad cop?<br style="clear:both;" />"
and I want to get the last tag
"<br style="clear:both;" />"
My re - r'[<]([\w] \b)(.^<) [/][>]'
doesn't work. I expected to find match by excluding '<'
symbol.
https://regex101.com/r/BDD30S/1
CodePudding user response:
Note: Using Regex to parse HTML is a terrible idea!
However, I can not resist a challenge, so here goes:
import re
haystack = '<li style="-moz-float-edge: content-box">... that in <i><b><a href="/wiki/Laßt_uns_sorgen,_laßt_uns_wachen,_BWV_213" title="Lat uns sorgen, lat uns wachen, BWV 213">Die Wahl des Herkules</a></b></i>, Hercules must choose between the good cop and the bad cop?<br style="clear:both;" />'
needle = r'(<[^<>]*>)'
matches = re.findall(needle, haystack)
if matches:
print(matches[-1])
This code finds the last non-nested tag. It fails horribly if the element has <
or >
anywhere in its attributes or text content.
If you had an opening and a closing tag for an element, this would find only the closing tag.
<br style="clear:both;" />
CodePudding user response:
If you really want to use regex, do this:
(<[^<>] >)[^<>]*$ /m
- Use the /
m
flag along with$
anchor to mark the end line [^<>]
captures everything inside the HTML tag[^<>]*
ensures that there can be stuff between the last tag and the end of the line- The expected result is available in the capturing group
CodePudding user response:
To get the last tag on the same line:
.*(<[^<>\n]*>)
Explanation
.*
Match the whole line(<[^<>\n]*>)
Capture in group 1<...>
The last tag in all lines:
[\s\S]*(<[^<>] >)
Explanation
[\s\S]*
Match all characters(<[^<>] >)
Capture in group 1<...>
CodePudding user response:
This finds the last tag as requested:
Regex:
r'<br.*$'
Code:
import re
my_string = '<li style="-moz-float-edge: content-box">... that in <i><b><a href="/wiki/Laßt_uns_sorgen,_laßt_uns_wachen,_BWV_213" title="Lat uns sorgen, lat uns wachen, BWV 213">Die Wahl des Herkules</a></b></i>, Hercules must choose between the good cop and the bad cop?<br style="clear:both;" />'
last_tag = re.search(r'<br.*$', my_string)
print(last_tag[0])
Output:
<br style="clear:both;" />