Home > Mobile >  How to fix my Python code to look for occurrences across multiple URLs and not just one
How to fix my Python code to look for occurrences across multiple URLs and not just one

Time:07-06

So, first time doing anything in Python, and any coding language for that matter.

I want to count how many lawyers in a certain company that attended different schools.

What I've got so far:

import requests
from bs4 import BeautifulSoup
import re

page = requests.get("https://www.mannheimerswartling.se/medarbetare/")
soup = BeautifulSoup(page.text, 'html.parser')
solo_body = soup.body
body = solo_body.text.lower()
stockholmcount = (body.count("stockholms uni"))
lundcount = (body.count("lunds uni"))
uppsalacount = (body.count("uppsalas uni"))
goteborgcount = (body.count("göteborgs uni"))
orebrocount = (body.count("örebros uni"))
karlstadcount = (body.count("karlstads uni"))

urls = ['https://www.mannheimerswartling.se/medarbetare/hanne-aarsheim/', 'https://www.mannheimerswartling.se/medarbetare/sarmad-abdul-nabi/']
for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    solo_body = soup.body
    body = solo_body.text.lower()
    stockholmcountadd = stockholmcount (body.count("stockholms uni"))
    lundcountadd = lundcount (body.count("lunds uni"))
    uppsalacountadd = uppsalacount (body.count("uppsalas uni"))
    goteborgcountadd = goteborgcount (body.count("göteborgs uni"))
    orebrocountadd = orebrocount (body.count("örebros uni"))
    karlstadcountadd = karlstadcount (body.count("karlstads uni"))
print("Stockholm: "   str(stockholmcountadd))
print("Lund: "   str(lundcountadd))
print("Uppsala: "   str(uppsalacountadd))
print("Göteborg: "   str(goteborgcountadd))
print("Örebro: "   str(orebrocountadd))
print("Karlstad: "   str(karlstadcountadd))

What I've noticed though, is that the output only looks at one of the URLs, and if I add a "break", it changes what URL that the script acts upon, but it never does both.

Gladly appreciate any help!

Edited for formatting purposes.

CodePudding user response:

You could consider building a dictionary keyed on the search phrases then build the whole thing into a loop like this:

import requests
from bs4 import BeautifulSoup

urls = ['https://www.mannheimerswartling.se/medarbetare/', 'https://www.mannheimerswartling.se/medarbetare/hanne-aarsheim/', 'https://www.mannheimerswartling.se/medarbetare/sarmad-abdul-nabi/']

phrases = ["stockholms uni", "lunds uni", "uppsalas uni", "göteborgs uni", "örebros uni", "karlstads uni"]

results = {}

for url in urls:
    (r := requests.get(url)).raise_for_status()
    body = BeautifulSoup(r.content, 'lxml').body.text.lower()
    for phrase in phrases:
        results[phrase] = results.get(phrase, 0)   body.count(phrase)

print(results)

Output:

{'stockholms uni': 1, 'lunds uni': 1, 'uppsalas uni': 0, 'göteborgs uni': 0, 'örebros uni': 0, 'karlstads uni': 0}

Note:

You'll need to do some work on the output but you should get the point

  • Related