Home > Net >  How to group and filter in python pandaas
How to group and filter in python pandaas

Time:11-25

I have a dataframe:

df_select = { 'ISIN': ['x123', 'x345' , 'x745' , 'x743', 'x984'], 
              'ISSUER': ['AH'  ,'BZ' , 'NP', 'BZ', 'AH'], 
              'Opt': [5,4,3,2,1],
             'Rating': [5,4,3,4,6],
              'control': [1,1,1,1,1]}
df= pd.DataFrame(df_select)

I want to filter those records where the sum of rating is larger than 6

x = df.groupby('ISSUER')['Rating'].sum()
x = x[x>6]

How do I make the final selection on the pandas (and in general, is this the best way to do this?)

df[df['ISSUER'] isin (x.index)] 

is not working

CodePudding user response:

Instead of aggregating with sum you need to use transform with sum

df.loc[df.groupby('ISSUER')['Rating'].transform('sum').gt(6)]

   ISIN ISSUER  Opt  Rating  control
0  x123     AH    5       5        1
1  x345     BZ    4       4        1
3  x743     BZ    2       4        1
4  x984     AH    1       6        1

CodePudding user response:

I assume you are trying to create a resulting dataframe with the complete information of the records with a sum rating larger than 6. In other words, I'm assuming your desired output is:

  ISIN ISSUER  Opt  Rating  control
0  x123     AH    5       5        1
1  x345     BZ    4       4        1
3  x743     BZ    2       4        1
4  x984     AH    1       6        1

Working under this assumption you could use the following:

new_df = df.loc[df['ISSUER'].isin(x.index.tolist())]

You can modify the resulting indexes if you would like with:

new_df = new_df.reset_index()
  • Related