In my dataframe I want to delete those groups of column B, in which all values in column C are smaller than 3.
So there should only be those groups left, which only have values in column C that are bigger than 3.
B | C |
---|---|
11 | 1 |
22 | 2 |
11 | 2 |
22 | 4 |
22 | 1 |
33 | 2 |
33 | 1 |
22 | 4 |
So in my example only group 22 should stay.
Probably something like this pseudo code:
df_clean = df.groupby('B')['C']< 3.0
How do I code an algorithm that can do this?
CodePudding user response:
As I understand, each group must have all its value less than 3 to be considered, I'd start by getting the groups that satisfy this condition by comparing the maximum value with the target:3
>>> groups = [group for group in df['C'].unique() if max(df[df.C==group].B.values) < 3]
>>> groups
[11, 33]
then, you can slice your dataframe and get a new one with only the desired groups
>>> df[df.C.isin(groups)]
C B
0 11 1
2 11 2
5 33 2
6 33 1
CodePudding user response:
You can use groupby
and any
with a for loop to get your desired output.
for i ,j in df.groupby('B'):
if (j['C'] >= 3).any() == True:
result = j
B C
1 22 2
3 22 4
4 22 1
7 22 4
or other way round if you are looking for groups with all values less than 3.
result = []
for i ,j in df.groupby('B'):
if (j['C'] < 3).all() == True:
result.append(j)
[ B C
0 11 1
2 11 2,
B C
5 33 2
6 33 1]