Getting back to a project I put aside a few months ago, and I was reviewing my code and I got stuck when importing a dataframe, but for some kind of reason, I can't drop certain columns here, and I just need 4 of them.
I'm a beginner btw.
So I'm trying to get data from this table:
import pandas as pd
import requests
url = 'https://www.hockey-reference.com/leagues/NHL_2022_goalies.html'
html = requests.get(url).content
df_list = pd.read_html(url)
df = df_list[0]
df.droplevel(level=0, axis='columns').filter(['Rk', 'Player', 'SV%', 'QS%'])
print(df)
But I get the whole table.
What am I doing wrong here?
Thanks a lot in advance!
CodePudding user response:
It is not so efficient for time complexity but i saved dataframe as .csv:
import pandas as pd
import requests
url = 'https://www.hockey-reference.com/leagues/NHL_2022_goalies.html'
html = requests.get(url).content
df_list = pd.read_html(url)
df = df_list[0]
df.to_csv('df1.csv')
and then i changed the first line of csv file manually like this:
CodePudding user response:
Just df=
to the line before the last and all is good.
Here is it: