I have a pd.DataFrame which has a column multiindex of the following structure:
a1 a1 a1 a2 a2 a2 a3 a3 a3 ...
b1 b2 b3 b1 b2 b3 b1 b2 b3 ...
row1 values ...
row2
row3
...
I would like to divide each a-group by the last element in its corresponding b3 column. I.e. divide the column a1-b1 by last value in a1-b3, same for a1-b2 and a1-b3 (such that the last element of a1-b3 will actually be 1). for the next group a2, same analogy but dividing its columns by last value in a2-b3. and so on...
I am looking for a simple expression to do this. I have a very cumbersome code where I split up the dataframe into groups that are stored in a dict and then divide each group respectively and merge them back together, but that cant be it, right?
Thanks!
Best, JZ
CodePudding user response:
Use groupby
:
df1 = df.groupby(level=0, axis=1).apply(lambda x: x / x.iloc[-1, -1])
print(df1)
# Output
a1 a2 a3
b1 b2 b3 b1 b2 b3 b1 b2 b3
row1 0.333333 0.666667 1.0 0.666667 0.833333 1.0 0.777778 0.888889 1.0
row2 0.333333 0.666667 1.0 0.666667 0.833333 1.0 0.777778 0.888889 1.0
row3 0.333333 0.666667 1.0 0.666667 0.833333 1.0 0.777778 0.888889 1.0
Setup:
data = {('a1', 'b1'): [1, 1, 1], ('a1', 'b2'): [2, 2, 2], ('a1', 'b3'): [3, 3, 3],
('a2', 'b1'): [4, 4, 4], ('a2', 'b2'): [5, 5, 5], ('a2', 'b3'): [6, 6, 6],
('a3', 'b1'): [7, 7, 7], ('a3', 'b2'): [8, 8, 8], ('a3', 'b3'): [9, 9, 9]}
df = pd.DataFrame(data)