Home > other >  Drop Rows in Pandas Based on Column Value
Drop Rows in Pandas Based on Column Value

Time:09-21

I am creating a df with Pandas that has several hundred rows while web scraping a sports website. I am attempting to parse through the rows and drop rows based on the value of a certain column. I've tried looking through W3 and other sites to find the correct method but nothing I've found really seems to match my need. I have my code listed below. Does anyone know of a good way to accomplish this?

import pandas as pd

def rec_career():
    url = 'https://www.pro-football-reference.com/years/2022/receiving.htm'
    base_url = 'https://www.pro-football-reference.com'
    #Establish Dictionary
    player_links = dict()
    # Use Pandas to read table
    table = pd.read_html(url, attrs={'id': 'receiving'})[0]
    table.head()
    table.index = range(len(table))
    for i, row in table.iterrows():
        if row[4] != 'WR' or 'TE':
            table = table.drop(index=i)
    print(table)

rec_career()

The above code returns an empty database so its obviously just parsing through and deleting all the rows but I am unsure why it is doing that. Im basically trying to drop players from the df that aren't receivers.

CodePudding user response:

Avoid using for loop in pandas, as pandas has more faster and concise methods:

...
table = pd.read_html(url, attrs={'id': 'receiving'})[0]
table.head()
table.index = range(len(table))
table = table[table.Pos.isin(['WR', 'TE'])]
print(table)
  • Related