Home > OS >  Scraping what's inside the links
Scraping what's inside the links

Time:11-13

I don't really have code for this problem. But I will try my best to actually explain everything.

Example alright, say you are scraping a website, and in the website there are 3 different links and you want to scrape what is inside each and everyone one of them without having to manually do it. Is this possible for just BeautifulSoup and the Requests library? Or would you have to use another library, for e.g scrapy.

If you want you can try it on this website: https://www.bleepingcomputer.com/ What I am trying to achieve is scrape the website, and what is inside the links at the same time.

If it's not possible to do it with only requests & Beautifulsoup feel free to use scrapy as well.

CodePudding user response:

you can scrape the links via tag. The html template will have the hyperlink listed and the actual website it links you to should be listed in href. Ex:

<li href=“https://google.com > Site 1 </li> The href would be the destination link and the site 1 is just the text shown in page

CodePudding user response:

You can do it with only requests and BeautifulSoup. Just add the links to a list or a dict and iterate the list.

  • Related