Home > database >  How to delete groups based on conditions/values of another column
How to delete groups based on conditions/values of another column

Time:09-02

In my dataframe I want to delete those groups of column B, in which all values in column C are smaller than 3.

So there should only be those groups left, which only have values in column C that are bigger than 3.

B C
11 1
22 2
11 2
22 4
22 1
33 2
33 1
22 4

So in my example only group 22 should stay.

Probably something like this pseudo code:

df_clean = df.groupby('B')['C']< 3.0

How do I code an algorithm that can do this?

CodePudding user response:

As I understand, each group must have all its value less than 3 to be considered, I'd start by getting the groups that satisfy this condition by comparing the maximum value with the target:3

>>> groups = [group for group in df['C'].unique() if max(df[df.C==group].B.values) < 3]
>>> groups
[11, 33]

then, you can slice your dataframe and get a new one with only the desired groups

>>> df[df.C.isin(groups)]
    C  B
0  11  1
2  11  2
5  33  2
6  33  1

CodePudding user response:

You can use groupby and any with a for loop to get your desired output.

for i ,j  in df.groupby('B'):
    if (j['C'] >= 3).any() == True:
        result = j

    B  C
1  22  2
3  22  4
4  22  1
7  22  4

or other way round if you are looking for groups with all values less than 3.

result = []
for i ,j  in df.groupby('B'):
    if (j['C'] < 3).all() == True:
        result.append(j)
        
[    B  C
 0  11  1
 2  11  2,
     B  C
 5  33  2
 6  33  1]
  • Related