Home > front end >  How to handle differently same class in HTML with BeautifulSoup
How to handle differently same class in HTML with BeautifulSoup

Time:12-20

I working my way into scrapping and I have created the code below. The webpage has several tables () which I would like to drill down a little bit further. There are 12 tables on the webpage and I would kindly like some help on how to tackle each one differently. The tables for gols and targetes I would like to handle differently from Titulars, Suplents, Equip Técnic,...

from bs4 import BeautifulSoup
from bs4.element import Stylesheet
import requests
import openpyxl

excel = openpyxl.Workbook()
# print(excel.sheetnames)
sheet = excel.active
sheet.title = "Acta Partido"
sheet.append(['Equipo Local', '', '', 'Equipo Visitante'])
# print (excel.sheetnames)

try:

    source = requests.get(
        'https://www.fcf.cat/acta/2022/futbol-11/cadet-primera-divisio/grup-2/1c/sant-ignasi-ce-a/1c/lhospitalet-centre-esports-b')

    source.raise_for_status()

    soup = BeautifulSoup(source.text, 'html.parser')

    actaEquipos = soup.find_all('div', class_='acta-equip')
    actaMarcador = soup.find('div', class_='acta-marcador').text.split("-")
    acta = soup.find_all(name='table', class_='acta-table')

    actaTitulo = soup.find('span', class_='apex').text.split("-")
    sheet.append([actaTitulo[0].strip(), actaMarcador[0].strip(),
                 actaMarcador[1].strip(), actaTitulo[1].strip()])

    for titulars in acta:
        print(titulars.getText())

except Exception as e:
    print(e)

excel.save('ActaPartido.xlsx')

Thanks,

CodePudding user response:

Think you can simply check what is the table about and handle your operation based on condition:

for t in soup.select('table.acta-table'):
    if 'Gols' in t.thead.text:
        print('do something special with gols')
    elif 'Targetes' in t.thead.text:
        print('do something special with targetes')
    else:
        print('do almost the same with the rest')

Example

from bs4.element import Stylesheet
import requests

source = requests.get('https://www.fcf.cat/acta/2022/futbol-11/cadet-primera-divisio/grup-2/1c/sant-ignasi-ce-a/1c/lhospitalet-centre-esports-b')
source.raise_for_status()

soup = BeautifulSoup(source.text, 'html.parser')
    
for t in soup.select('table.acta-table'):
    if 'Gols' in t.thead.text:
        for x in t.select('tr:not(:has(th))'):
            print(list(x.stripped_strings))
    elif 'Targetes' in t.thead.text:
        for x in t.select('tr:not(:has(th))'):
            print(list(x.stripped_strings))
    else:
        for x in t.select('tr:not(:has(th))'):
            print(list(x.stripped_strings))
  • Related