I have tried to loop some web scraping from a demo site
CodePudding user response:
Try printing out the laptop data. You will see that what is outputted is the same information in the Excel:
<div >
<div >
<img alt="item" src="/images/test-sites/e-commerce/items/cart2.png"/>
<div >
<h4 >$1799.00</h4>
<h4>
<a href="/test-sites/e-commerce/allinone/product/544" title="Asus ROG Strix SCAR Edition GL503VM-ED115T">Asus ROG Strix S...</a>
</h4>
<p >Asus ROG Strix SCAR Edition GL503VM-ED115T, 15.6" FHD 120Hz, Core i7-7700HQ, 16GB, 256GB SSD 1TB SSHD, GeForce GTX 1060 6GB, Windows 10 Home</p>
</div>
<div >
<p >8 reviews</p>
<p data-rating="3">
<span ></span>
<span ></span>
<span ></span>
</p>
</div>
</div>
</div>
The part you say you want to extract is the link, which is found here:
<a href="/test-sites/e-commerce/allinone/product/544" title="Asus ROG Strix SCAR Edition GL503VM-ED115T">Asus ROG Strix S...</a>
One way you could get the link is by finding this tag inside of the div
tag it's located in:
for laptop in laptops:
laptop_link = laptop.find('a') # Find the title link
text = laptop_link.get_text()
print(text)
Then, to get the hyperlink itself as opposed to the text inside, you need to get the tag's href
attribute, like this:
for laptop in laptops:
laptop_link = laptop.find('a') # Find the title link
text = laptop_link['href'] # Get the link attribute
print(text)