Home > Enterprise >  How can I split string between group of word in python?
How can I split string between group of word in python?

Time:12-19

How can I split the "Value1" and "Value2 from this string?

my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'

I try to this but it's not work.

my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'
for i in my_str:
    i = str(i).split('^<a.*>$|</a>')
    print(i)

CodePudding user response:

You can use bs4.BeautifulSoup:

from bs4 import BeautifulSoup
soup = BeautifulSoup(my_str)
out = [st.string for st in soup.find_all('a')]

Output:

['Value1', 'Value2']

CodePudding user response:

One another way is to use cleaning techniques for extraction, you split on one character and remove out unwanted values.

Here's the code, I used


my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'

strList = my_str.split('/a>',maxsplit = 2)

for i in strList:
    try:
        print(i.split('>')[1].replace('<',''))
    except IndexError:
        pass

This will get you Value1 and Value2

CodePudding user response:

If you want to do regex splitting on html, which again you shouldn’t (see bs4 answer above for way better answer).

import re
my_str = '<a href="default.html" target="_top">Value1</a><a href="browser.html" target="_top">Value2</a>'
split_str = re.findall(r'(?<=>)\w*?(?=<\/a>)', my_str)

CodePudding user response:

This works if you want the entire html element for each.

import re
re.sub("(a>)(<a)", "\\1[SEP]\\2", my_str).split("[SEP]")

if you just want the values, do this

re.findall("\>(.[^<] )<\/a>", my_str)
  • Related