Home > database >  Regex - how can I get the last tag in this string
Regex - how can I get the last tag in this string

Time:12-22

I have a string

"<li style="-moz-float-edge: content-box">... that in <i><b><a href="/wiki/Laßt_uns_sorgen,_laßt_uns_wachen,_BWV_213" title="Lat uns sorgen, lat uns wachen, BWV 213">Die Wahl des Herkules</a></b></i>, Hercules must choose between the good cop and the bad cop?<br style="clear:both;" />" 

and I want to get the last tag

"<br style="clear:both;" />"

My re - r'[<]([\w] \b)(.^<) [/][>]' doesn't work. I expected to find match by excluding '<' symbol.

https://regex101.com/r/BDD30S/1

CodePudding user response:

Note: Using Regex to parse HTML is a terrible idea!

However, I can not resist a challenge, so here goes:

import re

haystack = '<li style="-moz-float-edge: content-box">... that in <i><b><a href="/wiki/Laßt_uns_sorgen,_laßt_uns_wachen,_BWV_213" title="Lat uns sorgen, lat uns wachen, BWV 213">Die Wahl des Herkules</a></b></i>, Hercules must choose between the good cop and the bad cop?<br style="clear:both;" />'

needle = r'(<[^<>]*>)'
matches = re.findall(needle, haystack)
if matches:
  print(matches[-1])

This code finds the last non-nested tag. It fails horribly if the element has < or > anywhere in its attributes or text content. If you had an opening and a closing tag for an element, this would find only the closing tag.

<br style="clear:both;" />

CodePudding user response:

If you really want to use regex, do this:

(<[^<>] >)[^<>]*$ /m

  1. Use the /m flag along with $ anchor to mark the end line
  2. [^<>] captures everything inside the HTML tag
  3. [^<>]* ensures that there can be stuff between the last tag and the end of the line
  4. The expected result is available in the capturing group

Demo

CodePudding user response:

To get the last tag on the same line:

.*(<[^<>\n]*>)

Explanation

  • .* Match the whole line
  • (<[^<>\n]*>) Capture in group 1 <...>

Regex demo


The last tag in all lines:

[\s\S]*(<[^<>] >)

Explanation

  • [\s\S]* Match all characters
  • (<[^<>] >) Capture in group 1 <...>

Regex demo

CodePudding user response:

This finds the last tag as requested:

Regex: r'<br.*$'

Code:

import re

my_string = '<li style="-moz-float-edge: content-box">... that in <i><b><a href="/wiki/Laßt_uns_sorgen,_laßt_uns_wachen,_BWV_213" title="Lat uns sorgen, lat uns wachen, BWV 213">Die Wahl des Herkules</a></b></i>, Hercules must choose between the good cop and the bad cop?<br style="clear:both;" />'

last_tag = re.search(r'<br.*$', my_string)

print(last_tag[0])

Output:

<br style="clear:both;" />
  • Related