Home > Software engineering >  how to get quartiles and classify a value according to this quartile range
how to get quartiles and classify a value according to this quartile range

Time:04-14

I have this df:

d = pd.DataFrame({'Name':['Andres','Lars','Paul','Mike'],
                  'target':['A','A','B','C'],
                  'number':[10,12.3,11,6]})

And I want classify each number in a quartile. I am doing this:

(d.groupby(['Name','target','number'])['number']
 .quantile([0.25,0.5,0.75,1]).unstack()
 .reset_index()
 .rename(columns={0.25:"1Q",0.5:"2Q",0.75:"3Q",1:"4Q"})
)

But as you can see, the 4 quartiles are all equal because the code above is calculating per row so if there's one 1 number per row all quartiles are equal.

If a run instead:

d['number'].quantile([0.25,0.5,0.75,1])

Then I have the 4 quartiles I am looking for:

0.25     9.000
0.50    10.500
0.75    11.325
1.00    12.300

What I need as output(showing only first 2 rows)

   Name    target   number     1Q      2Q      3Q        4Q     Rank
0   Andres  A       10.0       9.0    10.5    11.325    12.30     1
1   Lars    A       12.3       9.0    10.5    11.325    12.30     4

you can see all quartiles has the the values considering tall values in the number column. Besides that, now we have a column names Rank that classify the number according to it's quartile. ex. In the first row 10 is within the 1st quartile.

CodePudding user response:

Here's one way that build on the quantiles you've created by making it a DataFrame and joining it to d. Also assigns "Rank" column using rank method:

out = (d.join(d['number'].quantile([0.25,0.5,0.75,1])
              .set_axis([f'{i}Q' for i in range(1,5)], axis=0)
              .to_frame().T
              .pipe(lambda x: x.loc[x.index.repeat(len(d))])
              .reset_index(drop=True))
       .assign(Rank=d['number'].rank(method='dense')))

Output:

     Name target  number   1Q    2Q      3Q    4Q  Rank
0  Andres      A    10.0  9.0  10.5  11.325  12.3   2.0
1    Lars      A    12.3  9.0  10.5  11.325  12.3   4.0
2    Paul      B    11.0  9.0  10.5  11.325  12.3   3.0
3    Mike      C     6.0  9.0  10.5  11.325  12.3   1.0
  • Related