for each group I want to calculate percentage change relative to ctr value of "a" (in second index). Each "a" may be in different position.
MRE:
df = pd.DataFrame({"A":[1,1,1,3,3,3, 1,1,1,3,3,3],
"B":["a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c"],
"click":[100, 200, 123, 333, 222, 333, 100, 200, 123, 333, 222, 333],
"exp":[10000, 10000, 10000, 10000, 10000, 10000, 20000, 20000, 20000, 20000, 20000, 20000]})
grp_df = df.groupby(["A", "B"]).sum()
grp_df["ctr"] = grp_df["click"] / grp_df["exp"] * 100
grp_df = (grp_df.groupby(level=0, group_keys=False)
.apply(lambda g: g.sort_values('ctr', ascending=False))
)
grp_df["rank"] = (grp_df.groupby(level=0, group_keys=False)["ctr"]
.rank(ascending=False))
outputs:
click exp ctr rank
A B
1 b 400 30000 1.333333 1.0
c 246 30000 0.820000 2.0
a 200 30000 0.666667 3.0
3 a 666 30000 2.220000 1.5
c 666 30000 2.220000 1.5
b 444 30000 1.480000 3.0
Desired output:
click exp ctr rank relative_p
A B
1 b 400 30000 1.333333 1.0 1.000000
c 246 30000 0.820000 2.0 0.230000
a 200 30000 0.666667 3.0 0.000000
3 a 666 30000 2.220000 1.5 0.000000
c 666 30000 2.220000 1.5 0.000000
b 444 30000 1.480000 3.0 -0.333333
I can achieve this by looping through each row however wondering if there are more efficient way. maybe by leveraging power of pct_change()
operation offered by pandas.
new_dfs = list()
grp1_lst = grp_df.index.get_level_values("A").unique().tolist()
for grp in grp1_lst:
df = grp_df.loc[grp_df.index.get_level_values("A") == grp].copy()
org_ctr = df.loc[df.index.get_level_values("B") == "a"]["ctr"].iloc[0]
df["relative_p"] = (df["ctr"] - org_ctr) /org_ctr
new_dfs.append(df)
new_df = pd.concat(new_dfs)
CodePudding user response:
Idea is replace non a
rows to missing values by Series.where
and then use GroupBy.transform
with GroupBy.first
:
org_ctr = (grp_df['ctr'].where(grp_df.index.get_level_values("B") == "a")
.groupby(level=0)
.transform('first'))
grp_df["relative_p"] = (grp_df["ctr"] - org_ctr) /org_ctr
print (grp_df)
click exp ctr rank relative_p
A B
1 b 400 30000 1.333333 1.0 1.000000
c 246 30000 0.820000 2.0 0.230000
a 200 30000 0.666667 3.0 0.000000
3 a 666 30000 2.220000 1.5 0.000000
c 666 30000 2.220000 1.5 0.000000
b 444 30000 1.480000 3.0 -0.333333