I am trying to scrape a keywords list from a site but the list is stored in different classes with the same name.
<div >
<span >
<a href="/en/keyword/chicken-restaurant">Chicken Restaurant</a>
</span>
<span >
<a href="/en/keyword/restaurant">Restaurant</a>
</span>
<span >
<a href="/en/keyword/fried-chicken">Fried Chicken</a>
</span>
<span >
<a href="/en/keyword/restaurant-order-in">Restaurant Order In</a>
</span>
<span >
<a href="/en/keyword/restaurant-eat-out">Restaurant Eat Out</a>
</span>
</div>
</div>
This is how the data is stored in the HTML form, I am only interested in the string after the href,
r = requests.get('https://yellowpages.com.eg/en/profile/5-roosters-fried-chicken/629053?
position=1&key=Fast-Food&mod=category&categoryId=1527')
soup = BeautifulSoup(r.content, 'lxml')
word = soup.find_all('div', class_='keywords content-div')
for item in word:
keywords = soup.find('span', class_='keyword key-content').find('a').text
print(keywords)
here is my code but it only fetchs the first line and I need all of the list.
CodePudding user response:
You need to find all <div>
nodes, then all child <span>
nodes of each <div>
, then all child <a>
nodes of each <span>
and retrieve text.
Code:
html = ... # response.content
soup = BeautifulSoup(html, 'html.parser')
for div in soup.find_all('div', class_='keywords content-div'):
for span in div.find_all('span', class_='keyword key-content'):
for a in span.find_all('a'):
print(a.text)
Output:
Chicken Restaurant
Restaurant
Fried Chicken
Restaurant Order In
Restaurant Eat Out
Alternatively you can use css selector:
soup = BeautifulSoup(html, 'html.parser')
for a in soup.select('div.keywords.content-div > span.keyword.key-content > a'):
print(a.text)
You can help my country, check my profile info.