Home > database >  how to get span items after < !--- > in soup
how to get span items after < !--- > in soup

Time:03-28

Hi I am trying to get the item 377 after the sold where it is followed by a < !-- -- >. How do i do so?I got 2 items with the following code. I added space so that it's visible.

sold = soup.find_all('span', {"class":"jsx-302 jsx-385"})

Result:

<span jsx-302 jsx-385""><span jsx-302 jsx-385 sold-text"">Sold</span> < !-- -- >377</span>, 

<span jsx-302 jsx-385"">Rp41,400 / 100 g</span>

I can do a regex to get only the first items[0].text containing sold and ignore the rest. However is there a way to handle span with < !-- -- > that is in brackets?

CodePudding user response:

Would agree to use split() but HTML look not that valid, so behavior of < !-- -- > or <!-- --> is not clear.

In case of < !-- -- >:

soup.select_one('span:has(.sold-text)').text.split('>')[-1]

In case of <!-- -->:

soup.select_one('span:has(.sold-text)').text.split(' ')[-1]

I would recommend to filter for digits:

''.join(filter(str.isdigit, soup.select_one('span:has(.sold-text)').text))

Example

from bs4 import BeautifulSoup,Comment

html = '''
<span "><span >Sold</span> < !-- -- >377</span>
<span >Rp41,400 / 100 g</span>
'''
soup=BeautifulSoup(html,'html.parser')

sold = ''.join(filter(str.isdigit, soup.select_one('span:has(.sold-text)').text))

print(sold) 
Output
377

CodePudding user response:

You can get the value 377 easily using split() method as follows:

doc='''
<span ><span >Sold < !-- -- >377
'''

from bs4 import BeautifulSoup

soup=BeautifulSoup(doc,'html.parser')
for sold in soup.find_all('span', {"class":"jsx-302 jsx-385"}):
    sold=sold.text
    sold=sold.split('>')[-1]
    print(sold)

Output:

377
  • Related