Home > Mobile >  Pandas - group by quantile position
Pandas - group by quantile position

Time:09-01

I have a pandas dataframe with the following structure:

import pandas as pd
df = pd.DataFrame(
    {
        "value": [0, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 0],
        "group": ["A"] * 6   ["B"] * 6
    }
)

I would like to obtain a new dataframe, with the same number of rows, that has, for each row the quantile that corresponds to the value in the group.

For this case, the output would be like this:

enter image description here

There can be a very large number of groups and values on completely different scales, and each group may have different sizes.

CodePudding user response:

import pandas as pd

df = pd.DataFrame(
    {
        "value": [0, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 0],
        "group": ["A"] * 6   ["B"] * 6,
    }
)


def norm_group(group):
    g_min = group.value.min()
    g_max = group.value.max()
    group["quantile"] = (group.value - g_min) / (g_max - g_min)
    return group


print(df.groupby("group").apply(norm_group))

seems to do the trick:

    value group  quantile
0       0     A       0.0
1       1     A       0.2
2       2     A       0.4
3       3     A       0.6
4       4     A       0.8
5       5     A       1.0
6       5     B       1.0
7       4     B       0.8
8       3     B       0.6
9       2     B       0.4
10      1     B       0.2
11      0     B       0.0
  • Related