How can I split string between group of word in python?-CodePudding

How can I split the "Value1" and "Value2 from this string?

my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'

I try to this but it's not work.

my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'
for i in my_str:
    i = str(i).split('^<a.*>$|</a>')
    print(i)

CodePudding user response：

You can use bs4.BeautifulSoup:

from bs4 import BeautifulSoup
soup = BeautifulSoup(my_str)
out = [st.string for st in soup.find_all('a')]

Output:

['Value1', 'Value2']

CodePudding user response：

One another way is to use cleaning techniques for extraction, you split on one character and remove out unwanted values.

Here's the code, I used


my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'

strList = my_str.split('/a>',maxsplit = 2)

for i in strList:
    try:
        print(i.split('>')[1].replace('<',''))
    except IndexError:
        pass

This will get you Value1 and Value2

CodePudding user response：

If you want to do regex splitting on html, which again you shouldn’t (see bs4 answer above for way better answer).

import re
my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'
split_str = re.findall(r'(?<=>)\w*?(?=<\/a>)', my_str)

CodePudding user response：

This works if you want the entire html element for each.

import re
re.sub("(a>)(<a)", "\\1[SEP]\\2", my_str).split("[SEP]")

if you just want the values, do this

re.findall("\>(.[^<] )<\/a>", my_str)