I have the following data
data = [
[1, True],
[1, True],
[1, True],
[1, True],
[2, True],
[2, False],
[2, True],
[3, True],
[3, True],
[3, True],
[3, True],
[4, True],
[4, True],
[4, False],
[5, True],
[5, True],
[5, True],
[5, True],
]
df = pd.DataFrame(data, columns=['ids', 'accept'])
And I would like to filter out all rows whose IDs that have at least one False
value in the accept
column. So my result should look like (note the missing 2 and 4 IDs):
ids accept
0 1 True
1 1 True
2 1 True
3 1 True
4 3 True
5 3 True
6 3 True
7 3 True
8 5 True
9 5 True
10 5 True
11 5 True
I was able to get a list of the IDs for which all values in the accept column are True
using the groupby()
and all()
methods:
# Use the groupby() method to group the DataFrame by the 'ids' column
grouped = df.groupby('ids')
# Use the all() method to check whether all values in the 'accept' column are True for each group
accept_all_true = grouped['accept'].all()
But I am stuck at this point. How can I apply this grouping to my original data frame?
CodePudding user response:
If performance is important you need avoid DataFrameGroupBy.filter
because bad performance in larger DataFrame:
First idea is filter ids
for at least one False
and then fiter ouput this ids
:
df1 = df[~df['ids'].isin(df.loc[~df['accept'],'ids'])]
print (df1)
ids accept
0 1 True
1 1 True
2 1 True
3 1 True
7 3 True
8 3 True
9 3 True
10 3 True
14 5 True
15 5 True
16 5 True
17 5 True
Or you can use GroupBy.transform
for test if all Trues per groups, performance should be slowier:
df1 = df[df.groupby('ids')['accept'].transform('all').eq(df['accept'])]
CodePudding user response:
df.groupby('ids').filter(lambda x: x['accept'].sum() == len(x))
Or if you want use all
use following code:
df.groupby('ids').filter(lambda x: x['accept'].all() == True)
same result