I would like to make my data frame more aesthetically appealing and drop what I believe are the unnecessary first row and column from the multi-index. I would like the column headers to be: 'Rk', 'Team','Conf','G','Rec','ADJOE',.....,'WAB'
Any help is such appreciated.
import pandas as pd
url = 'https://www.barttorvik.com/#'
df = pd.read_html(url)
df = df[0]
df
CodePudding user response:
You only have to iterate over the existing columns and select the second value. Then you can set the list of values as new columns:
import pandas as pd
url = 'https://www.barttorvik.com/#'
df = pd.read_html(url)
df.columns = [x[1] for x in df.columns]
df.head()
Output:
Rk Team Conf G Rec AdjOE AdjDE Barthag EFG% EFGD% ... ORB DRB FTR FTRD 2P% 2P%D 3P% 3P%D Adj T. WAB
0 1 Gonzaga WCC 24 22-211–0 122.42 89.05 .97491 60.21 421 ... 30.2120 2318 30.4165 21.710 62.21 41.23 37.821 29.111 73.72 4.611
1 2 Houston Amer 25 21-410–2 117.39 89.06 .95982 53.835 42.93 ... 37.26 27.6141 28.2242 33.3247 54.827 424 34.8108 29.418 65.2303 3.416
CodePudding user response:
When you read from HTML, specify the row number you want as header:
df = pd.read_html(url, header=1)[0]
print(df.head())
output:
>>
Rk Team Conf G Rec ... 2P%D 3P% 3P%D Adj T. WAB
0 1 Gonzaga WCC 24 22-211–0 ... 41.23 37.821 29.111 73.72 4.611
1 2 Houston Amer 25 21-410–2 ... 424 34.8108 29.418 65.2303 3.416
2 3 Kentucky SEC 26 21-510–3 ... 46.342 35.478 29.519 68.997 4.89
3 4 Arizona P12 25 23-213–1 ... 39.91 33.7172 31.471 72.99 6.24
4 5 Baylor B12 26 21-59–4 ... 49.2165 35.966 30.440 68.3130 6.15