Home > other >  How can I get the html tags from an input rather than the text?
How can I get the html tags from an input rather than the text?

Time:08-15

I'm trying to make a program that takes input and then outputs the HTML tags. Although I've managed to do the opposite.

import re

text = '<p>I want this bit removed</p>'
tags = re.search('>(.*)<', text)

print(tags.group(1))

At the moment, if I run this, it removes the HTML tags and keeps the text. But I want it so that the output is ['p','/p']. How can I do this? I also want to make it so that it can adapt to any input.

Also, if possible, I'd like to adapt this to a for loop

CodePudding user response:

Just change the regex to look for the text inside the < > instead.

import re

text = '<p>I want this bit removed</p>'
tags = re.findall('<([^>]*)>', text) # [^>] means anything except a `>`

print(tags) # tags is an iterable object (basically a list) here
  • Related