I am doing a web scrapping for top 10 teams icc, i got same class for both points and matches
"td",class_='table-body__cell u-center-text'
how do i split this
page=requests.get(url1)
page
soup1 = BeautifulSoup(page.content,"html.parser") print(soup1.prettify())
matches = []
for i in soup1.find_all("td",class_='rankings-block__banner-matches'):
matches.append(i.text)
matches
CodePudding user response:
Simple way use pandas
You can use pandas to read the table into a dataframe and pick the values you want:
import pandas as pd
pd.read_html('https://www.icc-cricket.com/rankings/mens/team-rankings/odi/')[0]
Alternative with bs4
matches = [x.get_text() for x in soup.select('table.table tr td:nth-of-type(3)')]
points = [x.get_text() for x in soup.select('table.table tr td:nth-of-type(4)')]
print(matches, points)
or
matches=[]
points=[]
for x in soup.select('table.table tr')[1:]:
matches.append(x.select_one('td:nth-of-type(3)').get_text())
points.append(x.select_one('td:nth-of-type(4)').get_text())
print(matches, points)
CodePudding user response:
A complete solution, just run this code and you will get a dictionary with all the data from the table organized nicely:
# get the entire table
table = soup1.find('table', {'class': 'table'})
# create dictionary to hold results
rankings = {}
# separate first row since it uses different markup than the rest
position = table.find('td', {'class': 'rankings-block__banner--pos'}).text.strip()
country_name = table.find('span', {'class': 'u-hide-phablet'}).text.strip()
matches = table.find('td', {'class': 'rankings-block__banner--matches'}).text.strip()
points = table.find('td', {'class': 'rankings-block__banner--points'}).text.strip()
rating = table.find('td', {'class': 'rankings-block__banner--rating u-text-right'}).text.strip()
rankings[country_name] = {'position': position,
'matches': matches,
'points': points,
'rating': rating}
# for the next rows, use a loop
for row in table.find_all('tr', {'class': 'table-body'}):
position = row.find('td', {'class': 'table-body__cell table-body__cell--position u-text-right'}).text.strip()
country_name = row.find('span', {'class': 'u-hide-phablet'}).text.strip()
matches = row.find_all('td', {'class': 'table-body__cell u-center-text'})[0].text.strip()
points = row.find_all('td', {'class': 'table-body__cell u-center-text'})[1].text.strip()
rating = row.find('td', {'class': 'table-body__cell u-text-right rating'}).text.strip()
rankings[country_name] = {'position': position,
'matches': matches,
'points': points,
'rating': rating}
rankings
Which outputs:
{'New Zealand': {'position': '1',
'matches': '17',
'points': '2,054',
'rating': '121'},
'England': {'position': '2',
'matches': '32',
'points': '3,793',
'rating': '119'},
'Australia': {'position': '3',
'matches': '28',
'points': '3,244',
'rating': '116'},
'India': {'position': '4',
'matches': '32',
'points': '3,624',
'rating': '113'},
'South Africa': {'position': '5',
'matches': '25',
'points': '2,459',
'rating': '98'},
'Pakistan': {'position': '6',
'matches': '27',
'points': '2,524',
'rating': '93'},
'Bangladesh': {'position': '7',
'matches': '30',
'points': '2,740',
'rating': '91'},
'West Indies': {'position': '8',
'matches': '30',
'points': '2,523',
'rating': '84'},
'Sri Lanka': {'position': '9',
'matches': '32',
'points': '2,657',
'rating': '83'},
'Afghanistan': {'position': '10',
'matches': '17',
'points': '1,054',
'rating': '62'},
'Netherlands': {'position': '11',
'matches': '7',
'points': '336',
'rating': '48'},
'Ireland': {'position': '12',
'matches': '25',
'points': '1,145',
'rating': '46'},
'Oman': {'position': '13', 'matches': '11', 'points': '435', 'rating': '40'},
'Scotland': {'position': '14',
'matches': '8',
'points': '308',
'rating': '39'},
'Zimbabwe': {'position': '15',
'matches': '20',
'points': '764',
'rating': '38'},
'Nepal': {'position': '16', 'matches': '11', 'points': '330', 'rating': '30'},
'UAE': {'position': '17', 'matches': '9', 'points': '190', 'rating': '21'},
'United States': {'position': '18',
'matches': '14',
'points': '232',
'rating': '17'},
'Namibia': {'position': '19', 'matches': '6', 'points': '97', 'rating': '16'},
'Papua New Guinea': {'position': '20',
'matches': '10',
'points': '0',
'rating': '0'}}
In addition, you can also add it to a pandas dataframe for better analysis:
pd.DataFrame(rankings)
Which outputs:
New Zealand England Australia India South Africa Pakistan Bangladesh West Indies Sri Lanka Afghanistan Netherlands Ireland Oman Scotland Zimbabwe Nepal UAE United States Namibia Papua New Guinea
position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
matches 17 32 28 32 25 27 30 30 32 17 7 25 11 8 20 11 9 14 6 10
points 2,054 3,793 3,244 3,624 2,459 2,524 2,740 2,523 2,657 1,054 336 1,145 435 308 764 330 190 232 97 0
rating 121 119 116 113 98 93 91 84 83 62 48 46 40 39 38 30 21 17 16 0