I scraped a table from pro-football-reference and created a Dataframe but seem to be running into an issue due to the need to convert the html to a string.
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
rb_r = requests.get('https://www.pro-football-reference.com/years/2021/rushing.htm')
rb_webpage = bs(rb_r.content, features='lxml')
rb_table = rb_webpage.find('table', attrs={'id': 'rushing'})
rb_df = pd.read_html(str(rb_table))[0]
print(rb_df.head().to_string())
Output:
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Games Rushing Unnamed: 14_level_0
Rk Player Tm Age Pos G GS Att Yds TD 1D Lng Y/A Y/G Fmb
0 1 Jonathan Taylor* IND 22 RB 17 17 332 1811 18 107 83 5.5 106.5 4
1 2 Najee Harris* PIT 23 RB 17 17 307 1200 7 62 37 3.9 70.6 0
2 3 Joe Mixon* CIN 25 RB 16 16 292 1205 13 60 32 4.1 75.3 2
3 4 Antonio Gibson WAS 23 RB 16 14 258 1037 7 65 27 4.0 64.8 6
4 5 Dalvin Cook* MIN 26 RB 13 13 249 1159 6 57 66 4.7 89.2
I'm trying to remove the "Unnamed: 0_level_0..." header but everything I try hasn't worked. Thanks in advance!
CodePudding user response:
You're near to your goal, just add the header parameter to pandas.read_html()
to select the correct one:
pd.read_html(str(rb_table), header=1)[0]
Example
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
rb_r = requests.get('https://www.pro-football-reference.com/years/2021/rushing.htm')
rb_webpage = bs(rb_r.content, features='lxml')
rb_table = rb_webpage.find('table', attrs={'id': 'rushing'})
rb_df = pd.read_html(str(rb_table), header=1)[0]
print(rb_df.head().to_string())
Output
Rk | Player | Tm | Age | Pos | G | GS | Att | Yds | TD | 1D | Lng | Y/A | Y/G | Fmb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Jonathan Taylor* | IND | 22 | RB | 17 | 17 | 332 | 1811 | 18 | 107 | 83 | 5.5 | 106.5 | 4 |
1 | 2 | Najee Harris* | PIT | 23 | RB | 17 | 17 | 307 | 1200 | 7 | 62 | 37 | 3.9 | 70.6 | 0 |
2 | 3 | Joe Mixon* | CIN | 25 | RB | 16 | 16 | 292 | 1205 | 13 | 60 | 32 | 4.1 | 75.3 | 2 |
3 | 4 | Antonio Gibson | WAS | 23 | RB | 16 | 14 | 258 | 1037 | 7 | 65 | 27 | 4 | 64.8 | 6 |
4 | 5 | Dalvin Cook* | MIN | 26 | RB | 13 | 13 | 249 | 1159 | 6 | 57 | 66 | 4.7 | 89.2 | 3 |