So, first time doing anything in Python, and any coding language for that matter.
I want to count how many lawyers in a certain company that attended different schools.
What I've got so far:
import requests
from bs4 import BeautifulSoup
import re
page = requests.get("https://www.mannheimerswartling.se/medarbetare/")
soup = BeautifulSoup(page.text, 'html.parser')
solo_body = soup.body
body = solo_body.text.lower()
stockholmcount = (body.count("stockholms uni"))
lundcount = (body.count("lunds uni"))
uppsalacount = (body.count("uppsalas uni"))
goteborgcount = (body.count("göteborgs uni"))
orebrocount = (body.count("örebros uni"))
karlstadcount = (body.count("karlstads uni"))
urls = ['https://www.mannheimerswartling.se/medarbetare/hanne-aarsheim/', 'https://www.mannheimerswartling.se/medarbetare/sarmad-abdul-nabi/']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
solo_body = soup.body
body = solo_body.text.lower()
stockholmcountadd = stockholmcount (body.count("stockholms uni"))
lundcountadd = lundcount (body.count("lunds uni"))
uppsalacountadd = uppsalacount (body.count("uppsalas uni"))
goteborgcountadd = goteborgcount (body.count("göteborgs uni"))
orebrocountadd = orebrocount (body.count("örebros uni"))
karlstadcountadd = karlstadcount (body.count("karlstads uni"))
print("Stockholm: " str(stockholmcountadd))
print("Lund: " str(lundcountadd))
print("Uppsala: " str(uppsalacountadd))
print("Göteborg: " str(goteborgcountadd))
print("Örebro: " str(orebrocountadd))
print("Karlstad: " str(karlstadcountadd))
What I've noticed though, is that the output only looks at one of the URLs, and if I add a "break", it changes what URL that the script acts upon, but it never does both.
Gladly appreciate any help!
Edited for formatting purposes.
CodePudding user response:
You could consider building a dictionary keyed on the search phrases then build the whole thing into a loop like this:
import requests
from bs4 import BeautifulSoup
urls = ['https://www.mannheimerswartling.se/medarbetare/', 'https://www.mannheimerswartling.se/medarbetare/hanne-aarsheim/', 'https://www.mannheimerswartling.se/medarbetare/sarmad-abdul-nabi/']
phrases = ["stockholms uni", "lunds uni", "uppsalas uni", "göteborgs uni", "örebros uni", "karlstads uni"]
results = {}
for url in urls:
(r := requests.get(url)).raise_for_status()
body = BeautifulSoup(r.content, 'lxml').body.text.lower()
for phrase in phrases:
results[phrase] = results.get(phrase, 0) body.count(phrase)
print(results)
Output:
{'stockholms uni': 1, 'lunds uni': 1, 'uppsalas uni': 0, 'göteborgs uni': 0, 'örebros uni': 0, 'karlstads uni': 0}
Note:
You'll need to do some work on the output but you should get the point