Trying to learn something today and doing a bit of scrapping.
I am trying to list product names and corresponding image URLs into a spreadsheet.
I managed to store the names but the images dont seem to work. Hopefully you can help!
Here is the code I use for extracting the text:
results[0].find('p', {'class': 'product-card__name'}).get_text()
Here is what I thought would extract the image:
results[0].find('img', {'class':'product-card__image'}).get_src()
This is obvioulsy not working.Returning that "'NoneType' object is not callable"
Any pointers?
For reference, below is the source I am trying to scrape.
<li ><a href="/p/63818/bumbu-the-original-rum-glass-pack" title=" Bumbu The Original Rum Glass Pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])"><div ><img src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" alt="Bumbu The Original Rum Glass Pack" loading="lazy" width="3" height="4"></div><div ><p > Bumbu The Original Rum<span >Glass Pack</span></p><p > 70cl / 40% </p></div><div ><p > £39.95 </p><p > (£57.07 per litre) </p></div></a></li>
CodePudding user response:
To grab the image url, you have to call .get('src')
instead of .get_src()
results[0].find('img', {'class':'product-card__image'}).get('src')
Example:
html='''
<li >
<a href="/p/63818/bumbu-the-original-rum-glass-pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])" title=" Bumbu The Original Rum Glass Pack">
<div >
<img alt="Bumbu The Original Rum Glass Pack" height="4" loading="lazy" src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" width="3"/>
</div>
<div >
<p >
Bumbu The Original Rum
<span >
Glass Pack
</span>
</p>
<p >
70cl / 40%
</p>
</div>
<div >
<p >
£39.95
</p>
<p >
(£57.07 per litre)
</p>
</div>
</a>
</li>
'''
from bs4 import BeautifulSoup
soup=BeautifulSoup(html, "html.parser")
#print(soup.prettify())
print(soup.find('img', {'class':'product-card__image'}).get('src'))
Output:
https://img.thewhiskyexchange.com/480/rum_bum4.jpg