Home > front end >  Pandas: How to delete rows where 2 conditions in 2 different columns need to be met
Pandas: How to delete rows where 2 conditions in 2 different columns need to be met

Time:12-09

Let's say I have a data frame that looks like this. I want to delete everything with a certain ID if all of its Name values are empty. Like in this example, every name value is missing in the rows where ID is 2. Even if I have 100 rows with the ID 3 and only one name values is present, I want to keep it.

ID Name
1 NaN
1 Banana
1 NaN
2 NaN
2 NaN
2 NaN
3 Apple
3 NaN

So the desired output looks like this:

ID Name
1 NaN
1 Banana
1 NaN
3 Apple
3 NaN

Everything I tried so far was wrong. In this attempt, I tried to count every NaN Value that belongs to an ID, but it still returns me too many rows. This is the closest I got to my desired outcome.

df = df[(df['ID']) & (df['Name'].isna().sum()) != 0]

CodePudding user response:

You want to exclude rows from IDs that have as many NaNs as they have rows. Therefore, you can group by ID and count their number of rows and number of NaNs.

Based on this result, you can get the IDs from people whose row count equals their NaN count and exclude them from your original dataframe.

# Declare column that indicates if `Name` is NaN
df['isna'] = df['Name'].isna().astype(int)

# Declare a dataframe that counts the rows and NaNs per `ID`
counter = df.groupby('ID').agg({'Name':'size', 'isna':'sum'})

# Get ID's from people who have as many NaNs as they have rows
exclude = counter[counter['Name'] == counter['isna']].index.values

# Exclude these IDs from your data
df = df[~df['ID'].isin(exclude)]

CodePudding user response:

Using .groupby and .query

ids = df.groupby(["ID", "Name"]).agg({"Name": "count"}).rename(columns={"Name": "Count"}).reset_index()["ID"].tolist()
df = df.query("ID.isin(@ids)").reset_index(drop=True)
print(df)

Output:

   ID    Name
0   1     NaN
1   1  Banana
2   1     NaN
3   3   Apple
4   3     NaN
  • Related