Home > Back-end >  Beginner - trying to scrape link and export to excel in Python and BS4
Beginner - trying to scrape link and export to excel in Python and BS4

Time:12-26

I have tried to loop some web scraping from a demo site enter image description here

CodePudding user response:

Try printing out the laptop data. You will see that what is outputted is the same information in the Excel:

<div >
<div >
<img alt="item"  src="/images/test-sites/e-commerce/items/cart2.png"/>
<div >
<h4 >$1799.00</h4>
<h4>
<a  href="/test-sites/e-commerce/allinone/product/544" title="Asus ROG Strix SCAR Edition GL503VM-ED115T">Asus ROG Strix S...</a>
</h4>
<p >Asus ROG Strix SCAR Edition GL503VM-ED115T, 15.6" FHD 120Hz, Core i7-7700HQ, 16GB, 256GB SSD   1TB SSHD, GeForce GTX 1060 6GB, Windows 10 Home</p>
</div>
<div >
<p >8 reviews</p>
<p data-rating="3">
<span ></span>
<span ></span>
<span ></span>
</p>
</div>
</div>
</div>

The part you say you want to extract is the link, which is found here:

<a  href="/test-sites/e-commerce/allinone/product/544" title="Asus ROG Strix SCAR Edition GL503VM-ED115T">Asus ROG Strix S...</a>

One way you could get the link is by finding this tag inside of the div tag it's located in:

for laptop in laptops:
    laptop_link = laptop.find('a') # Find the title link
    text = laptop_link.get_text()
    print(text)

Then, to get the hyperlink itself as opposed to the text inside, you need to get the tag's href attribute, like this:

for laptop in laptops:
    laptop_link = laptop.find('a') # Find the title link
    text = laptop_link['href'] # Get the link attribute
    print(text)
  • Related