Home > Blockchain >  filter out many rows where NaN another column in pandas
filter out many rows where NaN another column in pandas

Time:03-10

I have a table at looks like this

person | subject | grade
-------------------------
Cindy    Math      95
Cindy    English   88
Cindy    Science   93  
Mina     Math      78
Mina     English   89
Mina     Science   NaN
Brian    Math.     NaN
Brian    English   NaN
Brian    Science   NaN

I want to remove Brian since he has NaNs in grads for the subjects.

I can't do df[~df['grade'].isna()] because that will remove Mina

CodePudding user response:

Use groupby and transform to filter out your dataframe:

df = df[df.groupby('person')['grade'].transform('count') > 0]
print(df)

# Output
  person  subject  grade
0  Cindy     Math   95.0
1  Cindy  English   88.0
2  Cindy  Science   93.0
3   Mina     Math   78.0
4   Mina  English   89.0
5   Mina  Science    NaN

CodePudding user response:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "person":["Cindy","Cindy","Cindy","Mina","Mina","Mina","Brian","Brian","Brian"],
    "subject":["Math","English","Science","Math","English","Scinece","Math","English","Science"],
    "grade":[95,88,93,78,89, np.nan,np.nan,np.nan,np.nan ]
})

sub =  df[(df["person"] == "Brian") & (df["grade"].isnull())]
df.drop(sub.index)

  person  subject  grade
0  Cindy     Math   95.0
1  Cindy  English   88.0
2  Cindy  Science   93.0
3   Mina     Math   78.0
4   Mina  English   89.0
5   Mina  Scinece    NaN

CodePudding user response:

@corralien's answer is the most straight forward. The other way I was thinking was using the following to filter the dataframe:

df1.groupby(['person'])['grade'].apply(lambda x: x.isnull().sum())<3
  • Related