Home > Mobile >  How to scrape all ufc fighters and not repeat only the first fighter?
How to scrape all ufc fighters and not repeat only the first fighter?

Time:08-20

I am making a program to scrape UFC fighters' names and info using BeautifulSoup. I am using a for-loop iterating through the div holding this info scraping specific information.

The issue I am having is when printing the data only the first name is repeated as well as not displaying the correct amount of names. It repeats the name around ten times while there are definitely more than 11 names.

Current Code:

from cgitb import html
from bs4 import BeautifulSoup
import requests 

html_text = requests.get('https://www.ufc.com/athletes/all').text
soup = BeautifulSoup(html_text, "lxml")
fighters = soup.find_all('div', class_ = ("node node--type-athlete node--view-mode-all- 
athletes-result ds-1col clearfix"))
for fighter in enumerate(fighters, 1):
    fighter_name = soup.find('span', class_ = ("c-listing-athlete__name")).text.replace("_", 
" ")
    fighter_nickname = soup.find('div', class_ = ("field field--name-nickname field--type-string field--label-hidden")).text
    fighter_weight_class = soup.find('div', class_ = ("field field--name-stats-weight-class field--type-entity-reference field--label-hidden field__items")).text
    fighter_ufc_record = soup.find('span', class_ = ("c-listing-athlete__record")).text
    print(f'''Fighter Name: {fighter_name.strip()}''')
    print("************************")

output:

fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************
Fighter Name: Asjabharan
************************

When printing "fighter" I get all the data but I'm not able to manipulate the data neatly.

CodePudding user response:

You have to use fighters in lieu of enumerate(fighters, 1) and fighter instead of soup

from cgitb import html
from bs4 import BeautifulSoup
import requests 

html_text = requests.get('https://www.ufc.com/athletes/all').text
soup = BeautifulSoup(html_text, "lxml")
fighters = soup.find_all('div', class_ = ("node node--type-athlete node--view-mode-all-athletes-result ds-1col clearfix"))
for fighter in fighters:
    fighter_name = fighter.find('span', class_ = ("c-listing-athlete__name")).text.replace("_", " ")
    fighter_nickname = fighter.find('div', class_ = ("field field--name-nickname field--type-string field--label-hidden"))
    fighter_nickname = fighter_nickname.text if fighter_nickname else None
    fighter_weight_class = fighter.find('div', class_ = ("field field--name-stats-weight-class field--type-entity-reference field--label-hidden field__items")).text
    fighter_ufc_record = fighter.find('span', class_ = ("c-listing-athlete__record")).text
    print(f'''Fighter Name: {fighter_name.strip()}''')
    print("************************")

Output:

Fighter Name: Asjabharan
************************
Fighter Name: Angga -
************************
Fighter Name: Danny Abbadi
************************
Fighter Name: Nariman Abbassov
************************
Fighter Name: Tank Abbott
************************
Fighter Name: Hamdy Abdelwahab
************************
Fighter Name: Shamil Abdurakhimov
************************
Fighter Name: Daichi Abe
************************
Fighter Name: Papy Abedi
************************
Fighter Name: Ricardo Abreu
************************
Fighter Name: Klidson Abreu
************************

CodePudding user response:

issue I am having is when printing the data only the first name is repeated

As mentioned use fighter.find(...) inside your for-loop instead of soup.find(...) to focus your search to the html for the fighter only.

It repeats the name around ten times while there are definitely more than 11 names.

Issue here is, that more data is loaded via ajax, so you have to simulate these calls to get more then just the first 11.

Example

Note: Starts at page 245 to show that while loop stops if there is no more data available, simply change it to start from first page.

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}
payload = {
    'view_name': 'all_athletes',
    'page': 1,
    'view_path':'/athletes/all',
    'view_display_id': 'page',
    'gender': 'All',
    'page': 245,
    '_drupal_ajax': 1
}

data = []

while True:
    r = requests.post('https://www.ufc.com/views/ajax?_wrapper_format=drupal_ajax',headers=headers, data=payload)
    soup=BeautifulSoup(r.json()[3]['data'])

    fighters = soup.select('.view-items-wrp li:has(div.c-listing-athlete-flipcard__inner)')
    for fighter in fighters:
        data.append({
            'fighter_name': fighter.find('span', class_ = ("c-listing-athlete__name")).get_text(strip=True).replace("_", " ") if fighter.find('span', class_ = ("c-listing-athlete__name")) else None,
            'fighter_nickname': fighter.find('div', class_ = ("field field--name-nickname")).get_text(strip=True) if fighter.find('div', class_ = ("field field--name-nickname")) else None,
            'fighter_weight_class': fighter.find('div', class_ = ("field--name-stats-weight-class")).get_text(strip=True) if fighter.find('div', class_ = ("field--name-stats-weight-class")) else None,
            'fighter_ufc_record': fighter.find('span', class_ = ("c-listing-athlete__record")).get_text(strip=True) if fighter.find('span', class_ = ("c-listing-athlete__record")) else None,
        })

    if not fighters:
        break
    else:
        payload['page'] = payload['page'] 1
        print(payload['page'])

data

Output

[{'fighter_name': 'Emmanuel Yarbrough',
  'fighter_nickname': None,
  'fighter_weight_class': None,
  'fighter_ufc_record': '0-0-0 (W-L-D)'},
 {'fighter_name': 'Cale Yarbrough',
  'fighter_nickname': None,
  'fighter_weight_class': 'Middleweight',
  'fighter_ufc_record': '0-1-0 (W-L-D)'},
 {'fighter_name': 'Ashley Yoder',
  'fighter_nickname': None,
  'fighter_weight_class': "Women's Strawweight",
  'fighter_ufc_record': '8-8-0 (W-L-D)'},
 {'fighter_name': 'Sang Hoon Yoo',
  'fighter_nickname': None,
  'fighter_weight_class': 'Lightweight',
  'fighter_ufc_record': '0-0-0 (W-L-D)'},...]
  • Related