Home > Enterprise >  Removing 1% top and bottom percentiles given a condition
Removing 1% top and bottom percentiles given a condition

Time:12-18

I have the following dataset and I would like to remove that 1% top and bottom percentiles for each "PRIMARY_SIC_CODE" on the column "ROA", i.e., take all the different ROAS for each PRIMARY_SIC_CODE, and remove the quantiles and the rest of the rows in the dataset.

Is there any easy way of doing it? Thanks!

enter image description here

CodePudding user response:

Try along the lines of...

df.groupby("PRIMARY SIC CODE")['ROA'].quantile(q=0.1)

CodePudding user response:

If you want to exclude the top and bottom 1% by considering the column ROAS in its entirety:

top_1perc = df['ROA'].quantile(q=0.99)
bottom_1perc = df['ROA'].quantile(q=0.01)
new_df = df[(df['ROA']> bottom_1perc) & (df['ROA']< top_1perc)

If instead, you want to exclude them for each PRIMARY SIC CODE group:

df[df.groupby('PRIMARY SIC CODE')['ROA'].transform(\
   lambda x : ((x > x.quantile(q=0.01)) & (x<x.quantile(q=0.99)))).eq(1)]
  • Related