Home > Software engineering >  beautifulsoup - scraping an item through class with condition
beautifulsoup - scraping an item through class with condition

Time:06-13

for example I have this html

<div >a</div>
<div >b</div>
<div >c</div>
<div >aaaaaa</div>
...... item-x keep increasing randomly on it class
<div >aaaaaa</div>

I want to scrap all of the class item-X where the value of X is between 5 to 10

I know how to search with a partial class name

text = soup.select('div[class*="item-"]')

but I don't know how to add conditions for it

CodePudding user response:

You can simply use for loop.

import bs4 as bs

html = """
<div >a</div>
<div >b</div>
<div >c</div>
<div >aaaaaa</div>
<div >aaaaaa</div>
"""

soup = bs.BeautifulSoup(html, 'lxml')

for i in range(5, 10):
    text = soup.select('div[class*="item-'   str(i)   '"]')
    if text:
        print(text)

CodePudding user response:

You can use multiple CSS selectors joined by ,:

html_doc = """\
<div >a</div>
<div >b</div>
<div >c</div>
<div >aaaaaa</div>
<div >aaaaaa</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

texts = soup.select(",".join(f"div.item-{i}" for i in range(5, 11)))
for text in texts:
    print(text)

Prints:

<div >c</div>
<div >aaaaaa</div>
  • Related