Home > Software design >  getting class_ on wb scrapping for two elements
getting class_ on wb scrapping for two elements

Time:09-29

I am doing a web scrapping for top 10 teams icc, i got same class for both points and matches

"td",class_='table-body__cell u-center-text'

how do i split this

page=requests.get(url1)
page
soup1 = BeautifulSoup(page.content,"html.parser")   print(soup1.prettify())
matches = []
for i in soup1.find_all("td",class_='rankings-block__banner-matches'):
    matches.append(i.text)

matches

CodePudding user response:

Simple way use pandas

You can use pandas to read the table into a dataframe and pick the values you want:

import pandas as pd

pd.read_html('https://www.icc-cricket.com/rankings/mens/team-rankings/odi/')[0]

Alternative with bs4

matches = [x.get_text() for x in soup.select('table.table tr td:nth-of-type(3)')]
points = [x.get_text() for x in soup.select('table.table tr td:nth-of-type(4)')]

print(matches, points)

or

matches=[]
points=[]
for x in soup.select('table.table tr')[1:]:
    matches.append(x.select_one('td:nth-of-type(3)').get_text())
    points.append(x.select_one('td:nth-of-type(4)').get_text())

print(matches, points)

CodePudding user response:

A complete solution, just run this code and you will get a dictionary with all the data from the table organized nicely:

# get the entire table
table = soup1.find('table', {'class': 'table'})

# create dictionary to hold results
rankings = {}

# separate first row since it uses different markup than the rest
position = table.find('td', {'class': 'rankings-block__banner--pos'}).text.strip()
country_name = table.find('span', {'class': 'u-hide-phablet'}).text.strip()
matches = table.find('td', {'class': 'rankings-block__banner--matches'}).text.strip()
points = table.find('td', {'class': 'rankings-block__banner--points'}).text.strip()
rating = table.find('td', {'class': 'rankings-block__banner--rating u-text-right'}).text.strip()
rankings[country_name] = {'position': position,
                          'matches': matches,
                          'points': points,
                          'rating': rating}

# for the next rows, use a loop
for row in table.find_all('tr', {'class': 'table-body'}):
    position = row.find('td', {'class': 'table-body__cell table-body__cell--position u-text-right'}).text.strip()
    country_name = row.find('span', {'class': 'u-hide-phablet'}).text.strip()
    matches = row.find_all('td', {'class': 'table-body__cell u-center-text'})[0].text.strip()
    points = row.find_all('td', {'class': 'table-body__cell u-center-text'})[1].text.strip()
    rating = row.find('td', {'class': 'table-body__cell u-text-right rating'}).text.strip()
    rankings[country_name] = {'position': position,
                          'matches': matches,
                          'points': points,
                          'rating': rating}
rankings

Which outputs:

{'New Zealand': {'position': '1',
  'matches': '17',
  'points': '2,054',
  'rating': '121'},
 'England': {'position': '2',
  'matches': '32',
  'points': '3,793',
  'rating': '119'},
 'Australia': {'position': '3',
  'matches': '28',
  'points': '3,244',
  'rating': '116'},
 'India': {'position': '4',
  'matches': '32',
  'points': '3,624',
  'rating': '113'},
 'South Africa': {'position': '5',
  'matches': '25',
  'points': '2,459',
  'rating': '98'},
 'Pakistan': {'position': '6',
  'matches': '27',
  'points': '2,524',
  'rating': '93'},
 'Bangladesh': {'position': '7',
  'matches': '30',
  'points': '2,740',
  'rating': '91'},
 'West Indies': {'position': '8',
  'matches': '30',
  'points': '2,523',
  'rating': '84'},
 'Sri Lanka': {'position': '9',
  'matches': '32',
  'points': '2,657',
  'rating': '83'},
 'Afghanistan': {'position': '10',
  'matches': '17',
  'points': '1,054',
  'rating': '62'},
 'Netherlands': {'position': '11',
  'matches': '7',
  'points': '336',
  'rating': '48'},
 'Ireland': {'position': '12',
  'matches': '25',
  'points': '1,145',
  'rating': '46'},
 'Oman': {'position': '13', 'matches': '11', 'points': '435', 'rating': '40'},
 'Scotland': {'position': '14',
  'matches': '8',
  'points': '308',
  'rating': '39'},
 'Zimbabwe': {'position': '15',
  'matches': '20',
  'points': '764',
  'rating': '38'},
 'Nepal': {'position': '16', 'matches': '11', 'points': '330', 'rating': '30'},
 'UAE': {'position': '17', 'matches': '9', 'points': '190', 'rating': '21'},
 'United States': {'position': '18',
  'matches': '14',
  'points': '232',
  'rating': '17'},
 'Namibia': {'position': '19', 'matches': '6', 'points': '97', 'rating': '16'},
 'Papua New Guinea': {'position': '20',
  'matches': '10',
  'points': '0',
  'rating': '0'}}

In addition, you can also add it to a pandas dataframe for better analysis:

pd.DataFrame(rankings)

Which outputs:

New Zealand England Australia   India   South Africa    Pakistan    Bangladesh  West Indies Sri Lanka   Afghanistan Netherlands Ireland Oman    Scotland    Zimbabwe    Nepal   UAE United States   Namibia Papua New Guinea
position    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20
matches 17  32  28  32  25  27  30  30  32  17  7   25  11  8   20  11  9   14  6   10
points  2,054   3,793   3,244   3,624   2,459   2,524   2,740   2,523   2,657   1,054   336 1,145   435 308 764 330 190 232 97  0
rating  121 119 116 113 98  93  91  84  83  62  48  46  40  39  38  30  21  17  16  0
  • Related