Home > Net >  Web scraping search with Python Beautifulsoup or Selenium
Web scraping search with Python Beautifulsoup or Selenium

Time:03-18

I am trying to create a web crawler that is able to collect battle win/loss data from different superheroes from the website https://www.superherodb.com/battle/create/#close.

I have already scraped all the superhero names, I want to add each character individually and collect the data of their battle against all other characters. For example, Superman vs all, Thor vs all, etc... and collect data on the battles of each character vs all other characters.

For example, https://www.superherodb.com/superman-vs-thor/90-103/ contains stats on Superman vs Thor.

If possible, how can I also scrape the data in such an organized and clean fashion, that I can collect all the data in dict form for example: {"Superman_vs_Thor": [46, 2, 52]}, {"Superman_vs_Spiderman": [98, 2]}?

CodePudding user response:

I wasn't able to convert the info you need into a dict, but I was able to scrape them

here's the code:

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.superherodb.com/superman-vs-thor/90-103/')
soup = BeautifulSoup(r.text, 'lxml')

battle = soup.find('h1', class_='h1-battle')
superman = soup.find('div', class_='battle-team-result lose')
thor = soup.find('div', class_='battle-team-result win')
average = soup.find('div', class_='battle-team-result draw')

print('Battle:', battle.text)
print('Superman stats:', superman.text)
print('Thor stats:', thor.text)
print('Average:', average.text)

CodePudding user response:

Try it

from selenium.webdriver.common.by import By
from selenium import webdriver

driver = driver  = webdriver.Chrome()
driver.get("https://www.superherodb.com/superman-vs-thor/90-103/")
title = driver.find_element(By.CLASS_NAME,"h1-battle").text
characters = title.split("vs")
results = driver.find_elements(By.CLASS_NAME,"battle-team-result")

print('Title: ', title)

print(characters[0]   ': '   results[0].text)
print('Draw: ', results[1].text)
print(characters[1]   ': '   results[2].text)
  • Related