Home > Software engineering >  Difficulty Scraping Numerical Values from two divs with the same class
Difficulty Scraping Numerical Values from two divs with the same class

Time:04-24

I am new to web scraping and I am having an issue where I want to grab the "rank" and "Item No." from this url. My ultimate goal here is to save this info in a csv and be able to plot the data. The issue I have now is that these two values are placed in two different divs with the same class name, "item_stat".

 <div >
    <div >
          rank
         <span>
          1
         </span>
    </div>
    <div >
          item no.
         <span>
          #3251
         </span>
     </div>
 </div>

I am using the following code to grab the "rank" value.

page = requests.get(URL)
soup = bs(page.content, 'html.parser')
soup2 = bs(soup.prettify(), "html.parser")
lists = soup2.find('div', class_="featured_item")
stats = lists.find('div', class_="item_stats")
stats_val = lists.find('div', class_="item_stat")
rank = stats_val.text.replace('<', '')
rank_val = re.findall(r'\d ', rank)

Output:

   ['1']

I think I want this value as a float, and I also do not know how to find the "Item No." value. Using get_text(), and .text.replace() is giving me errors I haven't had with other scraping projects. I appreciate any advice, thanks.

CodePudding user response:

As one approach you could select all <span>s in your item_stats and extract its texts:

rank = float(soup.select('.item_stats span')[0].text)
number = soup.select('.item_stats span')[1].text.strip()

Example

html = '''
<div >
    <div >
          rank
         <span>
          1
         </span>
    </div>
    <div >
          item no.
         <span>
          #3251
         </span>
     </div>
 </div>
'''

soup = BeautifulSoup(html)

rank = float(soup.select('.item_stats span')[0].text)
number = soup.select('.item_stats span')[1].text.strip()

print(rank)
print(number)
Output
1.0
#3251
  • Related