As an amateur I have been working on a little coding project for fun. I am looking to scrape quite a lot of data, and with the help of StackOverflow I got a pretty well working script. However, I am still missing one big step; I want to find the titles for certain images on the webpage. I can already gather all other data I need (defined by the red markings). All I need is the titles for the 3x2 image titles. See the screenshot below:
The image titles are not defined by a 'class', which makes it hard for me to find them. I tried using
for KTA in soup('img'):
KTAclass = KTA.get('title')
Which does work, but also provides a lot of 'None's in addition to the titles I'm looking for.
My current script looks like this:
import requests
from bs4 import BeautifulSoup
def analyze(i):
url = f"https://ktarena.com/fr/207-dofus-world-cup/match/{i}/1"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
names = [a.text for a in soup.select(".name a")]
points = [p.text for p in soup.select(".result .points")]
arena = soup.find("span", attrs=('name')).text
print(*zip(names, points,),arena)
for i in range(46270, 46273):
analyze(i)
Can anyone help me out here? Ideally I would like to add the 3 image titles per team to the zipped file currently containing team name and points.
Cheers!
CodePudding user response:
not completely sure if I understand you correctly.
You can get the title of the 6 images like that:
image_titles = [elem.find("img").get("title") for elem in soup.find_all("div", {"class": "class"})]
which gives you:
['roublard', 'huppermage', 'ecaflip', 'steamer', 'feca', 'sacrieur']
If you have any questions or I misunderstood anything, please ask :)
CodePudding user response:
This should do it:
import requests
from bs4 import BeautifulSoup
def analyze(i):
url = f"https://ktarena.com/fr/207-dofus-world-cup/match/{i}/1"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
arena = soup.find("span", attrs=('name')).text
title = soup.select_one("[class='team'] .name a").text
point = soup.select(".result .points")[0].text
image_titles = ', '.join([i['title'] for i in soup.select("[class$='dead'] > img")])
title_ano = soup.select("[class='team'] .name a")[1].text
point_ano = soup.select(".result .points")[1].text
image_titles_ano = ', '.join([i['title'] for i in soup.select("[class='class'] > img")])
print((title,point,image_titles),(title_ano,point_ano,image_titles_ano),arena)
for i in range(46270, 46274):
analyze(i)
Prints:
('Thunder', '0 pts', 'roublard, huppermage, ecaflip') ('Tweaps', '60 pts', 'steamer, feca, sacrieur') A10
('Shadow Zoo', '0 pts', 'feca, osamodas, ouginak') ('UndisClosed', '60 pts', 'eniripsa, sram, pandawa') A10
('Laugh Tale', '0 pts', 'osamodas, ecaflip, iop') ('FromTheAbyss', '60 pts', 'roublard, steamer, huppermage') A10
('Motamawa', '0 pts', 'osamodas, iop, pandawa') ('Espoo', '60 pts', 'roublard, ecaflip, sacrieur') A10