I am trying to scrape the following line and extract the value of 7.7872, how to I make it work?
<span class='pos'><span class='arr_ud arrow_u5'> </span> 7.7872</span>
I tried the following code but there is some blank string which I cannot get ride off:
for a in soupUSD.find_all("span", attrs={"class":"pos"})[0]:
print(a)
I have the following result:
<span class='arr_ud arrow_u5'> </span> 7.7872
Any way I can just find the text of 7.7872 only?
CodePudding user response:
from bs4 import BeautifulSoup
spam = "<span class='pos'><span class='arr_ud arrow_u5'> </span> 7.7872</span>"
soup = BeautifulSoup(spam, 'html.parser')
span = soup.find('span', {'class':'pos'})
print(' '.join(span.stripped_strings))
output
7.7872
CodePudding user response:
Since at the same level of your target string there are other tags as well, the .string
attribute doesn't detect the string (in this case). So you can loop over the tag content and look for string, instances NavigableString
, then cast it to string.
from bs4 import BeautifulSoup, NavigableString
spam = "<span class='pos'><span class='arr_ud arrow_u5'> </span> 7.7872</span>"
soup = BeautifulSoup(spam, 'lxml')
span = soup.find('span', class_='pos')
nr = ''.join([str(string).strip() for string in span.contents if isinstance(string, NavigableString)])
print(nr)
# 7.7872
CodePudding user response:
Using core python lib (ElementTree)
import xml.etree.ElementTree as ET
dtd = '''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [
<!ENTITY nbsp ' '>
]>'''
html = '''<span class='pos'><span class='arr_ud arrow_u5'> </span> 7.7872</span>'''
root = ET.fromstring(dtd html)
print(list(root)[0].tail)
output
7.7872