Home > Software engineering >  Scraping list with the same class
Scraping list with the same class

Time:03-16

I am trying to scrape a keywords list from a site but the list is stored in different classes with the same name.

<div >
<span >
<a href="/en/keyword/chicken-restaurant">Chicken Restaurant</a>
</span>
<span >
<a href="/en/keyword/restaurant">Restaurant</a>
</span>
<span >
<a href="/en/keyword/fried-chicken">Fried Chicken</a>
</span>
<span >
<a href="/en/keyword/restaurant-order-in">Restaurant Order In</a>
</span>
<span >
<a href="/en/keyword/restaurant-eat-out">Restaurant Eat Out</a>
</span>
</div>
</div>

This is how the data is stored in the HTML form, I am only interested in the string after the href,

r = requests.get('https://yellowpages.com.eg/en/profile/5-roosters-fried-chicken/629053? 
position=1&key=Fast-Food&mod=category&categoryId=1527')
soup = BeautifulSoup(r.content, 'lxml')
word = soup.find_all('div', class_='keywords content-div')
for item in word:
    keywords = soup.find('span', class_='keyword key-content').find('a').text
    print(keywords)

here is my code but it only fetchs the first line and I need all of the list.

CodePudding user response:

You need to find all <div> nodes, then all child <span> nodes of each <div>, then all child <a> nodes of each <span> and retrieve text.

Code:

html = ...  # response.content

soup = BeautifulSoup(html, 'html.parser')
for div in soup.find_all('div', class_='keywords content-div'):
    for span in div.find_all('span', class_='keyword key-content'):
        for a in span.find_all('a'):
            print(a.text)

Output:

Chicken Restaurant
Restaurant
Fried Chicken
Restaurant Order In
Restaurant Eat Out

Alternatively you can use css selector:

soup = BeautifulSoup(html, 'html.parser')
for a in soup.select('div.keywords.content-div > span.keyword.key-content > a'):
    print(a.text)

You can help my country, check my profile info.

  • Related