I'm looking to grab the stats table from this example link:
https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2022&month=1000&season1=2022&ind=0&team=0,ts&rost=0&age=0&filter=&players=0&startdate=2022-04-07&enddate=2022-04-07&page=1_50
...however when I grab it, there's an extra column header that I can't seem to get rid of, and the bottom row has some useless "Page size" string that I could also get rid of.
I've provided an example code below for testing, along with some attempts to fix the issue, but to no avail.
from pandas import read_html, set_option
#set_option('display.max_rows', 20)
#set_option('display.max_columns', None)
# Extract the table from the provided link
url = "https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2022&month=1000&season1=2022&ind=0&team=0,ts&rost=0&age=0&filter=&players=0&startdate=2022-04-07&enddate=2022-04-07&page=1_50"
table_of_interest = read_html(url)[-2]
print(table_of_interest)
# Attempt 1 - https://stackoverflow.com/questions/68385659/in-pandas-python-how-do-i-get-rid-of-the-extra-column-header-with-index-numbers
df = table_of_interest.iloc[1:,:-1]
print(df)
# Attempt 2 - https://stackoverflow.com/questions/71379513/remove-extra-column-level-from-dataframe
df = table_of_interest.rename_axis(columns=None)
print(df)
results in output
I want to get rid of that top 1 Page size: select 14 items in 1 pages
column header. How?
CodePudding user response:
You could try as follows:
from pandas import read_html
url = "https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2022&month=1000&season1=2022&ind=0&team=0,ts&rost=0&age=0&filter=&players=0&startdate=2022-04-07&enddate=2022-04-07&page=1_50"
table_of_interest = read_html(url)[-2]
# keep only level 1 from the original MultiIndex cols
table_of_interest.columns = [col[1] for col in table_of_interest.columns]
# get rid of last `row`
table_of_interest = table_of_interest.iloc[:-1]
print(table_of_interest)
# Team G PA HR R RBI SB ... SLG wOBA xwOBA wRC BsR Off Def WAR
0 1 STL 13 41 3 9 9 1 ... .613 .413 NaN 173 0.1 3.6 0.4 0.6
1 2 NYM 16 42 0 5 5 0 ... .400 .388 NaN 157 -0.6 2.2 -0.3 0.3
2 3 MIL 15 40 0 4 4 1 ... .424 .346 NaN 122 0.2 1.2 -0.2 0.2
3 4 CHC 16 35 1 5 5 0 ... .483 .368 NaN 140 -0.5 1.0 -0.2 0.2
4 5 HOU 13 38 2 3 3 1 ... .514 .334 NaN 119 0.1 0.7 -0.1 0.2
5 6 CIN 14 38 1 6 6 0 ... .371 .301 NaN 85 0.0 -0.7 -0.1 0.1
6 7 CLE 14 37 0 1 1 1 ... .242 .252 NaN 66 0.2 -1.4 0.3 0.0
7 8 ARI 18 34 1 4 3 0 ... .231 .276 NaN 77 0.0 -1.1 -0.2 0.0
8 9 SDP 15 36 0 2 2 1 ... .172 .243 NaN 57 0.2 -1.6 -0.1 -0.1
9 10 WSN 15 35 1 1 1 0 ... .313 .243 NaN 51 -0.1 -2.3 -0.3 -0.1
10 11 ATL 14 36 1 3 2 0 ... .226 .227 NaN 44 0.0 -2.6 -0.1 -0.2
11 12 KCR 13 31 0 3 3 0 ... .214 .189 NaN 30 -0.4 -3.1 0.0 -0.2
12 13 LAA 15 31 0 1 1 0 ... .207 .183 NaN 21 0.0 -3.1 0.0 -0.2
13 14 PIT 18 32 0 0 0 0 ... .200 .209 NaN 34 -0.4 -3.0 -0.6 -0.3