I am new to web scraping and I am having an issue where I want to grab the "rank" and "Item No." from this url. My ultimate goal here is to save this info in a csv and be able to plot the data. The issue I have now is that these two values are placed in two different divs with the same class name, "item_stat".
<div >
<div >
rank
<span>
1
</span>
</div>
<div >
item no.
<span>
#3251
</span>
</div>
</div>
I am using the following code to grab the "rank" value.
page = requests.get(URL)
soup = bs(page.content, 'html.parser')
soup2 = bs(soup.prettify(), "html.parser")
lists = soup2.find('div', class_="featured_item")
stats = lists.find('div', class_="item_stats")
stats_val = lists.find('div', class_="item_stat")
rank = stats_val.text.replace('<', '')
rank_val = re.findall(r'\d ', rank)
Output:
['1']
I think I want this value as a float, and I also do not know how to find the "Item No." value. Using get_text(), and .text.replace() is giving me errors I haven't had with other scraping projects. I appreciate any advice, thanks.
CodePudding user response:
As one approach you could select all <span>
s in your item_stats
and extract its texts:
rank = float(soup.select('.item_stats span')[0].text)
number = soup.select('.item_stats span')[1].text.strip()
Example
html = '''
<div >
<div >
rank
<span>
1
</span>
</div>
<div >
item no.
<span>
#3251
</span>
</div>
</div>
'''
soup = BeautifulSoup(html)
rank = float(soup.select('.item_stats span')[0].text)
number = soup.select('.item_stats span')[1].text.strip()
print(rank)
print(number)
Output
1.0
#3251