Home > Blockchain >  Trying to scrape information held in two spans of two classes of the same name
Trying to scrape information held in two spans of two classes of the same name

Time:03-01

I would like to start of by saying, I am not good at any of this!

I'm trying to learn how to do scraping in Python and doing something for a personal project of mine.

This website right here(https://www.stilltasty.com/) contains rough durations of various food products before expiry. When inspecting the site I found that the rough duration of the food is held within what looks like an image.

You can see a snippet here, both strings are in a span element.

Doing this has allowed me to get access to the first duration held within the first arrow.

from bs4 import BeautifulSoup
import requests

url = "https://www.stilltasty.com/fooditems/index/17130"

result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
print(doc.find('div',class_ = 'red-arrow').find_next('span'))

However, I would like to get two both and I haven't had much luck yet. I tried using find_all and looping over the results trying to filter by the span, but I get completely different results. Matter of fact I get two results, but it is the "Blue Arrow" img that I get as my output.

I will appreciate any help in this matter and thank you in advance!

CodePudding user response:

First collect all the elements wit class red-arrow and iterate the ResultSet to get your information from the span:

for e in doc.find_all('div',class_ = 'red-arrow'):
    print(e.find_next('span').get_text(strip=True))

As alternative you could use css selectors and chain your selectors:

for e in doc.select('div.red-arrow   span'):
    print(e.get_text(strip=True))

Example

from bs4 import BeautifulSoup
import requests

url = "https://www.stilltasty.com/fooditems/index/17130"

result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")

for e in doc.find_all('div',class_ = 'red-arrow'):
    print(e.find_next('span').get_text(strip=True))

Output

3-5 days
1-2 months

CodePudding user response:

In addition to @HedgeHog's answer, another way to do it is to use css selectors:

for duration in (doc.select('div.food-storage-right span')):
   print(duration.text.strip())

Output:

3-5 days
1-2 months
  • Related