How to filter/drop row based on subquery condition within Pandas-CodePudding

I am working in pandas and want to drop few rows based on a sub condition.

Dataset

System	Category
A	1
A	2
B	1
C	1
D	1
D	2
E	2
F	2
G	1

I want to drop system rows having category value as 2 provided it(like A and D) has both category 1 and 2. If only category 2 alone is present for the system(like E and F), then no need to drop it.

Require output as below

System	Category
A	1
B	1
C	1
D	1
E	2
F	2
G	1

I tried pd.drop which has option to drop based on directly removing all system having category 2 but no option to provide a sub condition as explained above. Please suggest

CodePudding user response：

You can first sort the column(first by system then by category). Then remove the duplicate from System.

df.sort_values(['System', 'Category'], ascending=[True, True], inplace = True)
df.drop_duplicates(subset = ['System'], keep = 'first', inplace = True)

Output:

CodePudding user response：

You would want to remove rows with the conditions

Length of System group is > 1 and
Category is 2

m = (df.groupby('System')['Category'].transform(len) > 1) & (df['Category'] == 2)
df = df[~m].reset_index(drop=True)

print(df):

  System  Category
0      A         1
1      B         1
2      C         1
3      D         1
4      E         2
5      F         2
6      G         1