How can I split the "Value1" and "Value2 from this string?
my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'
I try to this but it's not work.
my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'
for i in my_str:
i = str(i).split('^<a.*>$|</a>')
print(i)
CodePudding user response:
You can use bs4.BeautifulSoup
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(my_str)
out = [st.string for st in soup.find_all('a')]
Output:
['Value1', 'Value2']
CodePudding user response:
One another way is to use cleaning techniques for extraction, you split on one character and remove out unwanted values.
Here's the code, I used
my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'
strList = my_str.split('/a>',maxsplit = 2)
for i in strList:
try:
print(i.split('>')[1].replace('<',''))
except IndexError:
pass
This will get you Value1 and Value2
CodePudding user response:
If you want to do regex splitting on html, which again you shouldn’t (see bs4 answer above for way better answer).
import re
my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'
split_str = re.findall(r'(?<=>)\w*?(?=<\/a>)', my_str)
CodePudding user response:
This works if you want the entire html element for each.
import re
re.sub("(a>)(<a)", "\\1[SEP]\\2", my_str).split("[SEP]")
if you just want the values, do this
re.findall("\>(.[^<] )<\/a>", my_str)