Home > other >  Groupby and if a column contains zero(s), drop the whole group
Groupby and if a column contains zero(s), drop the whole group

Time:07-31

I have the following dataframe. I want to only keep the group if the group has no zero in the "value" column. My current dataframe looks like the following.

ID   date   value
A    2020Q1  5
A    2020Q2  5
A    2020Q3  7
A    2020Q4  6
A    2021Q1  9
B    2019Q1  4
B    2019Q2  0
B    2019Q3  9
C    2019Q1  3
C    2019Q2  2
C    2019Q3  2
C    2019Q4  0
D    2019Q3  7
D    2019Q4  7
D    2020Q1  8
E    2020Q1  1
E    2020Q2  1
E    2020Q3  1
E    2020Q4  5
F    2018Q1  7
F    2018Q2  8
F    2018Q3  8
G    2018Q1  0
G    2018Q2  0
G    2018Q3  4
G    2018Q4  8
..

I want to drop the whole group if the group's value contains zero.

the desired output is

ID   date   value
A    2020Q1  5
A    2020Q2  5
A    2020Q3  7
A    2020Q4  6
A    2021Q1  9
D    2019Q3  7
D    2019Q4  7
D    2020Q1  8
E    2020Q1  1
E    2020Q2  1
E    2020Q3  1
E    2020Q4  5
F    2018Q1  7
F    2018Q2  8
F    2018Q3  8
..

thanks.

CodePudding user response:

We could use a groupby filter:

df.groupby('ID').filter(lambda x: not any(x['value'] == 0))

Output

   ID    date  value
0   A  2020Q1      5
1   A  2020Q2      5
2   A  2020Q3      7
3   A  2020Q4      6
4   A  2021Q1      9
12  D  2019Q3      7
13  D  2019Q4      7
14  D  2020Q1      8
15  E  2020Q1      1
16  E  2020Q2      1
17  E  2020Q3      1
18  E  2020Q4      5
19  F  2018Q1      7
20  F  2018Q2      8
21  F  2018Q3      8

Data:

import pandas as pd
from io import StringIO

data = StringIO("""
ID,date,value
A,2020Q1,5
A,2020Q2,5
A,2020Q3,7
A,2020Q4,6
A,2021Q1,9
B,2019Q1,4
B,2019Q2,0
B,2019Q3,9
C,2019Q1,3
C,2019Q2,2
C,2019Q3,2
C,2019Q4,0
D,2019Q3,7
D,2019Q4,7
D,2020Q1,8
E,2020Q1,1
E,2020Q2,1
E,2020Q3,1
E,2020Q4,5
F,2018Q1,7
F,2018Q2,8
F,2018Q3,8
G,2018Q1,0
G,2018Q2,0
G,2018Q3,4
G,2018Q4,8
""")

df = pd.read_csv(data, sep = ",")

CodePudding user response:

You could check for rows where value is 0 and get the 'ID' value (mask). Then filter the df with it:

mask = df.loc[df['value'].eq(0), 'ID'].unique()
# mask looks like this: ['B' 'C' 'G']

result = df.loc[~df['ID'].isin(mask),:]

Output result:

   ID    date  value
0   A  2020Q1      5
1   A  2020Q2      5
2   A  2020Q3      7
3   A  2020Q4      6
4   A  2021Q1      9
12  D  2019Q3      7
13  D  2019Q4      7
14  D  2020Q1      8
15  E  2020Q1      1
16  E  2020Q2      1
17  E  2020Q3      1
18  E  2020Q4      5
19  F  2018Q1      7
20  F  2018Q2      8
21  F  2018Q3      8

CodePudding user response:

Check Below code

import pandas as pd
import numpy as np 

df = pd.DataFrame({'col1':['A','A','A','B','B','C','C'],'col2':[1,1,2,0,1,1,2]})
df[df.groupby('col1')['col2'].transform('min') > 0]

Output:

enter image description here

CodePudding user response:

I think you can do it without groupby:

ids_to_delete = df.ID[df.value == 0].unqiue()
df = df.query("~(ID in @ids_to_delete)")

CodePudding user response:

You can use apply and df.drop to remove your undesired group (If you ID is actually your index)

condition = df[df.groupby("ID")['value'].apply(lambda x:x == 0)].index.unique()

df.drop(condition, inplace=True)

CodePudding user response:

cond = df.loc[df['value'] == 0]['ID'].unique()
for index in cond:
    for i in df.index:
        if df.loc[i,'ID'] == index:
            df.drop(index = i, inplace= True)
df

Or Simply (Shortcut version of above)

cond = df.loc[df['value'] == 0]['ID'].to_list()
df.drop(cond, inplace=True)
df
  • Related