Home > other >  Deleting certain rows in dataframe with groupby
Deleting certain rows in dataframe with groupby

Time:10-21

I have a dataframe of companies (represented by 'SCU_KEY') that are either customers or just prospects (shown as 'REVENUE_STATUS_FLAG'). Most SCU_KEYs have multiple entries and is updated regularly throughout time. I can't just delete rows that contain 0 in the dataframe, because it's important in my analysis to see moments when it goes from 1 to 0 or if 0 then returns to 1. My best assumption to solve it is through groupby, where if I group the SCU_KEY and the summation REVENUE_STATUS_FLAG is 0, that means that company (SCU_KEY) has only been a prospect and has not had an instance of adopting the product (open to hear alternative ways such as a lambda function for the df/groupby). The code is below and again the goal is to delete rows that contain 0 in the groupby (also containing the image of my returned code) and then make sure those SCU_KEYs are no longer accounted for in the df:

image of df

df_4.groupby('SCU_KEY')[['REVENUE_STATUS_FLAG']].sum()

CodePudding user response:

Use transform and a boolean mask to filter your dataframe:

df_4[df_4.groupby('SCU_KEY')['REVENUE_STATUS_FLAG'].transform('sum').gt(0)]

Example:

>>> df_4
   SCU_KEY  REVENUE_STATUS_FLAG
0        1                    0  # drop
1        1                    0  # drop
2        1                    0  # drop
3        2                    0
4        2                    1
5        2                    0
6        3                    0
7        3                    0
8        3                    2

>>> df_4[df_4.groupby('SCU_KEY')['REVENUE_STATUS_FLAG'].transform('sum').gt(0)]
   SCU_KEY  REVENUE_STATUS_FLAG
3        2                    0
4        2                    1
5        2                    0
6        3                    0
7        3                    0
8        3                    2

CodePudding user response:

You can also use:

df = pd.DataFrame(data={'scu_key':[1,1,1,2,2,2,3,3,3], 'revenue_status_flag':[0,0,0,1,0,1,1,0,1]})
df = df[df.groupby('scu_key')['revenue_status_flag'].transform(lambda x:  ~(x==0).all())]

Input:

   scu_key  revenue_status_flag
0        1                    0
1        1                    0
2        1                    0
3        2                    1
4        2                    0
5        2                    1
6        3                    1
7        3                    0
8        3                    1

Output:

   scu_key  revenue_status_flag
3        2                    1
4        2                    0
5        2                    1
6        3                    1
7        3                    0
8        3                    1
  • Related