Scraping img titles, but not all-CodePudding

As an amateur I have been working on a little coding project for fun. I am looking to scrape quite a lot of data, and with the help of StackOverflow I got a pretty well working script. However, I am still missing one big step; I want to find the titles for certain images on the webpage. I can already gather all other data I need (defined by the red markings). All I need is the titles for the 3x2 image titles. See the screenshot below:

The image titles are not defined by a 'class', which makes it hard for me to find them. I tried using

for KTA in soup('img'):
    KTAclass = KTA.get('title')

Which does work, but also provides a lot of 'None's in addition to the titles I'm looking for.

My current script looks like this:

import requests
from bs4 import BeautifulSoup


def analyze(i):
    url = f"https://ktarena.com/fr/207-dofus-world-cup/match/{i}/1"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")

    names = [a.text for a in soup.select(".name a")]
    points = [p.text for p in soup.select(".result .points")]
    arena = soup.find("span", attrs=('name')).text
    
    print(*zip(names, points,),arena)
    

for i in range(46270, 46273):  
    analyze(i)

Can anyone help me out here? Ideally I would like to add the 3 image titles per team to the zipped file currently containing team name and points.

Cheers!

CodePudding user response：

not completely sure if I understand you correctly.

You can get the title of the 6 images like that:

image_titles = [elem.find("img").get("title") for elem in soup.find_all("div", {"class": "class"})]

which gives you:

['roublard', 'huppermage', 'ecaflip', 'steamer', 'feca', 'sacrieur']

for this example page

If you have any questions or I misunderstood anything, please ask :)

CodePudding user response：

This should do it:

import requests
from bs4 import BeautifulSoup

def analyze(i):
    url = f"https://ktarena.com/fr/207-dofus-world-cup/match/{i}/1"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    arena = soup.find("span", attrs=('name')).text
    title = soup.select_one("[class='team'] .name a").text
    point = soup.select(".result .points")[0].text
    image_titles = ', '.join([i['title'] for i in soup.select("[class$='dead'] > img")])

    title_ano = soup.select("[class='team'] .name a")[1].text
    point_ano = soup.select(".result .points")[1].text
    image_titles_ano = ', '.join([i['title'] for i in soup.select("[class='class'] > img")])

    print((title,point,image_titles),(title_ano,point_ano,image_titles_ano),arena)

for i in range(46270, 46274):  
    analyze(i)

Prints:

('Thunder', '0 pts', 'roublard, huppermage, ecaflip') ('Tweaps', '60 pts', 'steamer, feca, sacrieur') A10
('Shadow Zoo', '0 pts', 'feca, osamodas, ouginak') ('UndisClosed', '60 pts', 'eniripsa, sram, pandawa') A10
('Laugh Tale', '0 pts', 'osamodas, ecaflip, iop') ('FromTheAbyss', '60 pts', 'roublard, steamer, huppermage') A10
('Motamawa', '0 pts', 'osamodas, iop, pandawa') ('Espoo', '60 pts', 'roublard, ecaflip, sacrieur') A10