I have a table at looks like this
person | subject | grade
-------------------------
Cindy Math 95
Cindy English 88
Cindy Science 93
Mina Math 78
Mina English 89
Mina Science NaN
Brian Math. NaN
Brian English NaN
Brian Science NaN
I want to remove Brian since he has NaNs in grads for the subjects.
I can't do
df[~df['grade'].isna()]
because that will remove Mina
CodePudding user response:
Use groupby
and transform
to filter out your dataframe:
df = df[df.groupby('person')['grade'].transform('count') > 0]
print(df)
# Output
person subject grade
0 Cindy Math 95.0
1 Cindy English 88.0
2 Cindy Science 93.0
3 Mina Math 78.0
4 Mina English 89.0
5 Mina Science NaN
CodePudding user response:
import pandas as pd
import numpy as np
df = pd.DataFrame({
"person":["Cindy","Cindy","Cindy","Mina","Mina","Mina","Brian","Brian","Brian"],
"subject":["Math","English","Science","Math","English","Scinece","Math","English","Science"],
"grade":[95,88,93,78,89, np.nan,np.nan,np.nan,np.nan ]
})
sub = df[(df["person"] == "Brian") & (df["grade"].isnull())]
df.drop(sub.index)
person subject grade
0 Cindy Math 95.0
1 Cindy English 88.0
2 Cindy Science 93.0
3 Mina Math 78.0
4 Mina English 89.0
5 Mina Scinece NaN
CodePudding user response:
@corralien's answer is the most straight forward. The other way I was thinking was using the following to filter the dataframe:
df1.groupby(['person'])['grade'].apply(lambda x: x.isnull().sum())<3