Home > other >  Ignoring specific rows in a pandas dataframe
Ignoring specific rows in a pandas dataframe

Time:05-24

I want to calculate the median of the "MP" and "FG" column. But I came across the problem that I cant calculate it when some rows in the array/table has strings in it. Like in row 8,18,19,20,21,24. As you can see in the picture.

https://i.stack.imgur.com/SybWM.png

List of NBA Stats

I tried different ways to do it. But my latest idea was this:

resultfg1920 = morantSeason1920.loc[morantSeason1920['FG'] == float]

print (resultfg1920)

Output

Empty DataFrame
Columns: [Rk, G, Date, Age, Tm, Unnamed: 5, Opp, Unnamed: 7, GS, MP, FG, FGA, FG%, 3P, 3PA, 3P%, FT, FTA, FT%, ORB, DRB, TRB, AST, STL, BLK, TOV, PF, PTS, GmSc,  /-]
Index: []

This is what I get from it, like just the first row, so all column names.

CodePudding user response:

You can't directly compare an object to its type using ==. Try

resultfg1920 = morantSeason1920.loc[morantSeason1920['FG'].apply(type) == float]

or

resultfg1920 = morantSeason1920.loc[morantSeason1920['FG'].apply(type) != str]

print (resultfg1920)

CodePudding user response:

I would start by either removing the rows with string values or replacing them with null. If you replace it with null, you will still have to remove them to return the correct answer.

Do this for whatever column you need calculations from:

df[df.columnName != 'text']

Once you remove the strings, you can use df.describe() to print out all of the data you need including median.

There are many ways to do this, this method is considered more manual in terms of handling a dataframe.

  • Related