How can you extract information from links which are stored in a list?-CodePudding

I want to get inside this list and get certain information (name, address, number, mail from the certain company) behind the links in this list:

['https://allianz-entwicklung-klima.de/kompensationspartner/aera-group/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/atmosfair-ggmbh/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/bischoff-ditze-energy-gmbh-co-kg/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/climate-extender-gmbh/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/climatepartner-gmbh/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/die-klimamanufaktur-gmbh/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/die-ofenmacher-e-v/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/first-climate/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/fokus-zukunft-gmbh-co-kg/']

All the information should be stored in a table in the end. I tried a for loop but it doesn't work for me, because I only get the first link to work but not the other ones.

I'm grateful for any help

CodePudding user response：

Myself personally for any web scraping I would use Selenium Web Driver. This will allow you to automate your browser with code. It can go to each of those links, select what you need, store their values, and return them.

CodePudding user response：

You could use a Python library called requests and BeautifulSoup for scraping these sites. I have written small code below, I have not had time to test it. But it should work. You have to extract the information with beautiful soup that you need and store it perhaps in a list of dictionaries like:

data = [{"name": "", "address": "", "number": "", "mail": ""}]

import requests
from bs4 import BeautifulSoup

links = ['https://allianz-entwicklung-klima.de/kompensationspartner/aera-group/',
        'https://allianz-entwicklung-klima.de/kompensationspartner/atmosfair-ggmbh/',
        'https://allianz-entwicklung-klima.de/kompensationspartner/bischoff-ditze-energy-gmbh-co-kg/',
        'https://allianz-entwicklung-klima.de/kompensationspartner/climate-extender-gmbh/',
        'https://allianz-entwicklung-klima.de/kompensationspartner/climatepartner-gmbh/',
        'https://allianz-entwicklung-klima.de/kompensationspartner/die-klimamanufaktur-gmbh/',
        'https://allianz-entwicklung-klima.de/kompensationspartner/die-ofenmacher-e-v/',
        'https://allianz-entwicklung-klima.de/kompensationspartner/first-climate/',
        'https://allianz-entwicklung-klima.de/kompensationspartner/fokus-zukunft-gmbh-co-kg/']

for link in links:
    page = requests.get(link)
    soup = BeautifulSoup(page.content, "html.parser")

To learn how to extract and use Beautiful Soup I would suggest to read this: Beautiful Soup: Build a Web Scraper With Python