I'm trying to make a program that takes input and then outputs the HTML tags. Although I've managed to do the opposite.
import re
text = '<p>I want this bit removed</p>'
tags = re.search('>(.*)<', text)
print(tags.group(1))
At the moment, if I run this, it removes the HTML tags and keeps the text. But I want it so that the output is ['p','/p']
. How can I do this? I also want to make it so that it can adapt to any input.
Also, if possible, I'd like to adapt this to a for loop
CodePudding user response:
Just change the regex to look for the text inside the <
>
instead.
import re
text = '<p>I want this bit removed</p>'
tags = re.findall('<([^>]*)>', text) # [^>] means anything except a `>`
print(tags) # tags is an iterable object (basically a list) here