I am creating a df with Pandas that has several hundred rows while web scraping a sports website. I am attempting to parse through the rows and drop rows based on the value of a certain column. I've tried looking through W3 and other sites to find the correct method but nothing I've found really seems to match my need. I have my code listed below. Does anyone know of a good way to accomplish this?
import pandas as pd
def rec_career():
url = 'https://www.pro-football-reference.com/years/2022/receiving.htm'
base_url = 'https://www.pro-football-reference.com'
#Establish Dictionary
player_links = dict()
# Use Pandas to read table
table = pd.read_html(url, attrs={'id': 'receiving'})[0]
table.head()
table.index = range(len(table))
for i, row in table.iterrows():
if row[4] != 'WR' or 'TE':
table = table.drop(index=i)
print(table)
rec_career()
The above code returns an empty database so its obviously just parsing through and deleting all the rows but I am unsure why it is doing that. Im basically trying to drop players from the df that aren't receivers.
CodePudding user response:
Avoid using for
loop in pandas, as pandas has more faster and concise methods:
...
table = pd.read_html(url, attrs={'id': 'receiving'})[0]
table.head()
table.index = range(len(table))
table = table[table.Pos.isin(['WR', 'TE'])]
print(table)