Home > Software design >  Python: How to scrape the string 7.7872 in <span class='pos'><span class='ar
Python: How to scrape the string 7.7872 in <span class='pos'><span class='ar

Time:09-21

I am trying to scrape the following line and extract the value of 7.7872, how to I make it work?

<span class='pos'><span class='arr_ud arrow_u5'> </span>&nbsp;7.7872</span>

I tried the following code but there is some blank string which I cannot get ride off:

for a in soupUSD.find_all("span", attrs={"class":"pos"})[0]:
    print(a)

I have the following result:

<span class='arr_ud arrow_u5'> </span>&nbsp;7.7872

Any way I can just find the text of 7.7872 only?

CodePudding user response:

from bs4 import BeautifulSoup

spam = "<span class='pos'><span class='arr_ud arrow_u5'> </span>&nbsp;7.7872</span>"
soup = BeautifulSoup(spam, 'html.parser')
span = soup.find('span', {'class':'pos'})
print(' '.join(span.stripped_strings))

output

7.7872

CodePudding user response:

Since at the same level of your target string there are other tags as well, the .string attribute doesn't detect the string (in this case). So you can loop over the tag content and look for string, instances NavigableString, then cast it to string.

from bs4 import BeautifulSoup, NavigableString

spam = "<span class='pos'><span class='arr_ud arrow_u5'> </span>&nbsp;7.7872</span>"
soup = BeautifulSoup(spam, 'lxml')
span = soup.find('span', class_='pos')

nr = ''.join([str(string).strip() for string in span.contents if isinstance(string, NavigableString)])

print(nr)
# 7.7872

CodePudding user response:

Using core python lib (ElementTree)

import xml.etree.ElementTree as ET


dtd = '''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
            "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [
            <!ENTITY nbsp ' '>
            ]>'''

html = '''<span class='pos'><span class='arr_ud arrow_u5'> </span>&nbsp;7.7872</span>'''
root = ET.fromstring(dtd   html)
print(list(root)[0].tail)

output

 7.7872
  • Related