Let's say I have a data frame that looks like this. I want to delete everything with a certain ID if all of its Name values are empty. Like in this example, every name value is missing in the rows where ID is 2. Even if I have 100 rows with the ID 3 and only one name values is present, I want to keep it.
ID | Name |
---|---|
1 | NaN |
1 | Banana |
1 | NaN |
2 | NaN |
2 | NaN |
2 | NaN |
3 | Apple |
3 | NaN |
So the desired output looks like this:
ID | Name |
---|---|
1 | NaN |
1 | Banana |
1 | NaN |
3 | Apple |
3 | NaN |
Everything I tried so far was wrong. In this attempt, I tried to count every NaN Value that belongs to an ID, but it still returns me too many rows. This is the closest I got to my desired outcome.
df = df[(df['ID']) & (df['Name'].isna().sum()) != 0]
CodePudding user response:
You want to exclude rows from IDs that have as many NaNs as they have rows. Therefore, you can group by ID and count their number of rows and number of NaNs.
Based on this result, you can get the IDs from people whose row count equals their NaN count and exclude them from your original dataframe.
# Declare column that indicates if `Name` is NaN
df['isna'] = df['Name'].isna().astype(int)
# Declare a dataframe that counts the rows and NaNs per `ID`
counter = df.groupby('ID').agg({'Name':'size', 'isna':'sum'})
# Get ID's from people who have as many NaNs as they have rows
exclude = counter[counter['Name'] == counter['isna']].index.values
# Exclude these IDs from your data
df = df[~df['ID'].isin(exclude)]
CodePudding user response:
ids = df.groupby(["ID", "Name"]).agg({"Name": "count"}).rename(columns={"Name": "Count"}).reset_index()["ID"].tolist()
df = df.query("ID.isin(@ids)").reset_index(drop=True)
print(df)
Output:
ID Name
0 1 NaN
1 1 Banana
2 1 NaN
3 3 Apple
4 3 NaN