I have a dataframe:
df_select = { 'ISIN': ['x123', 'x345' , 'x745' , 'x743', 'x984'],
'ISSUER': ['AH' ,'BZ' , 'NP', 'BZ', 'AH'],
'Opt': [5,4,3,2,1],
'Rating': [5,4,3,4,6],
'control': [1,1,1,1,1]}
df= pd.DataFrame(df_select)
I want to filter those records where the sum of rating is larger than 6
x = df.groupby('ISSUER')['Rating'].sum()
x = x[x>6]
How do I make the final selection on the pandas (and in general, is this the best way to do this?)
df[df['ISSUER'] isin (x.index)]
is not working
CodePudding user response:
Instead of aggregating with sum
you need to use transform
with sum
df.loc[df.groupby('ISSUER')['Rating'].transform('sum').gt(6)]
ISIN ISSUER Opt Rating control
0 x123 AH 5 5 1
1 x345 BZ 4 4 1
3 x743 BZ 2 4 1
4 x984 AH 1 6 1
CodePudding user response:
I assume you are trying to create a resulting dataframe with the complete information of the records with a sum rating larger than 6. In other words, I'm assuming your desired output is:
ISIN ISSUER Opt Rating control
0 x123 AH 5 5 1
1 x345 BZ 4 4 1
3 x743 BZ 2 4 1
4 x984 AH 1 6 1
Working under this assumption you could use the following:
new_df = df.loc[df['ISSUER'].isin(x.index.tolist())]
You can modify the resulting indexes if you would like with:
new_df = new_df.reset_index()