Home > Net >  percentage change relative to specific row for each group pandas
percentage change relative to specific row for each group pandas

Time:09-16

for each group I want to calculate percentage change relative to ctr value of "a" (in second index). Each "a" may be in different position.

MRE:

df = pd.DataFrame({"A":[1,1,1,3,3,3, 1,1,1,3,3,3],
                   "B":["a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c"],
                   "click":[100, 200, 123, 333, 222, 333, 100, 200, 123, 333, 222, 333],
                   "exp":[10000, 10000, 10000, 10000, 10000, 10000, 20000, 20000, 20000, 20000, 20000, 20000]})

grp_df = df.groupby(["A", "B"]).sum()
grp_df["ctr"] = grp_df["click"] / grp_df["exp"] * 100

grp_df = (grp_df.groupby(level=0, group_keys=False)
                .apply(lambda g: g.sort_values('ctr', ascending=False))
         )
grp_df["rank"] = (grp_df.groupby(level=0, group_keys=False)["ctr"]
                        .rank(ascending=False))

outputs:

        click   exp     ctr     rank
A   B               
1   b   400     30000   1.333333    1.0
    c   246     30000   0.820000    2.0
    a   200     30000   0.666667    3.0
3   a   666     30000   2.220000    1.5
    c   666     30000   2.220000    1.5
    b   444     30000   1.480000    3.0

Desired output:

        click   exp     ctr     rank    relative_p
A   B                   
1   b   400     30000   1.333333    1.0     1.000000
    c   246     30000   0.820000    2.0     0.230000
    a   200     30000   0.666667    3.0     0.000000
3   a   666     30000   2.220000    1.5     0.000000
    c   666     30000   2.220000    1.5     0.000000
    b   444     30000   1.480000    3.0     -0.333333

I can achieve this by looping through each row however wondering if there are more efficient way. maybe by leveraging power of pct_change() operation offered by pandas.

new_dfs = list()

grp1_lst = grp_df.index.get_level_values("A").unique().tolist()

for grp in grp1_lst:
    df = grp_df.loc[grp_df.index.get_level_values("A") == grp].copy()
    org_ctr = df.loc[df.index.get_level_values("B") == "a"]["ctr"].iloc[0]
    df["relative_p"] = (df["ctr"] - org_ctr) /org_ctr
    new_dfs.append(df)
    
new_df = pd.concat(new_dfs)

CodePudding user response:

Idea is replace non a rows to missing values by Series.where and then use GroupBy.transform with GroupBy.first:

org_ctr = (grp_df['ctr'].where(grp_df.index.get_level_values("B") == "a")
                        .groupby(level=0)
                        .transform('first'))
grp_df["relative_p"] = (grp_df["ctr"] - org_ctr) /org_ctr
print (grp_df)
     click    exp       ctr  rank  relative_p
A B                                          
1 b    400  30000  1.333333   1.0    1.000000
  c    246  30000  0.820000   2.0    0.230000
  a    200  30000  0.666667   3.0    0.000000
3 a    666  30000  2.220000   1.5    0.000000
  c    666  30000  2.220000   1.5    0.000000
  b    444  30000  1.480000   3.0   -0.333333
  • Related