Using pandas, I want to filter out all groups that contain only zero values
So in pseudo-code something like this
df.groupby('my_group')['values'].filter(all(iszero))
Example input dataframe could be something like this
df = pd.DataFrame({'my_group': ['A', 'B', 'C', 'D']*3, 'values': [0 if (x % 4 == 0 or x == 11) else random.random() for x in range(12)]})
my_group values
0 A 0.000000
1 B 0.286104
2 C 0.359804
3 D 0.596152
4 A 0.000000
5 B 0.560742
6 C 0.534575
7 D 0.251302
8 A 0.000000
9 B 0.445010
10 C 0.750434
11 D 0.000000
Here, group A
contains all zero values, so it should be filtered out. Group D
also has a zero value in row 11, but in other rows it has non-zero values, so it shouldn't be filtered out
CodePudding user response:
Here are possible solution from the best to worse performance:
#filtere groups by != 0 and then filter again original column by mask
df1 = df[df['my_group'].isin(df.loc[df['values'].ne(0), 'my_group'])]
#create mask by groupy.transform
df1 = df[df['values'].ne(0).groupby(df['my_group']).transform('any')]
#filtered by lambda function (if large data it is slow)
df1 = df.groupby('my_group').filter(lambda x: x['values'].ne(0).any())
print (df1)
my_group values
1 B 0.286104
2 C 0.359804
3 D 0.596152
5 B 0.560742
6 C 0.534575
7 D 0.251302
9 B 0.445010
10 C 0.750434
11 D 0.000000
CodePudding user response:
IIUC use a condition to keep the rows. For this if any
value in the group is not equal (ne
) to zero, then keep the group:
df2 = df.groupby('my_group').filter(lambda g: g['values'].ne(0).any())
output:
my_group values
1 B 0.286104
2 C 0.359804
3 D 0.596152
5 B 0.560742
6 C 0.534575
7 D 0.251302
9 B 0.445010
10 C 0.750434
11 D 0.000000
Or to get only the indices:
idx = df.groupby('my_group')['values'].filter(lambda s: s.ne(0).any()).index
output: Int64Index([1, 2, 3, 5, 6, 7, 9, 10, 11], dtype='int64')
CodePudding user response:
You can use:
>>> df[df.groupby('my_group')['values'].transform('any')]
my_group values
1 B 0.507089
2 C 0.846842
3 D 0.953003
5 B 0.085316
6 C 0.482732
7 D 0.764508
9 B 0.879005
10 C 0.717571
11 D 0.000000