I have a multiindexed dataframe, for example:
df = pd.DataFrame(np.random.randn(4,2), index=pd.MultiIndex.from_tuples([(1900, 'elem1'), (1900, 'elem2'), (1901, 'elem1'), (1901, 'elem2')]),
columns=['col1', 'col2'])
df.index.names=['y', 'elem']
df
col1 col2
y elem
1900 elem1 0.590143 -0.050658
elem2 0.208803 1.739487
1901 elem1 -2.336184 0.151083
elem2 -0.217127 -0.511950
I am trying to get the difference between 1900 and 1901 as part of the dataframe, as shown below:
col1 col2
y elem
1900 elem1 0.590143 -0.050658
elem2 0.208803 1.739487
1901 elem1 -2.336184 0.151083
elem2 -0.217127 -0.511950
diff elem1 -2.926327 0.201741
elem2 -0.42593 -2.251437
Any advice how I could archive this task? Your help is much appreciated!
CodePudding user response:
Subtract 1900 from 1901, append the diff
to the index and concatenate back to the main df:
temp = (df.loc[1901]
.sub(df.loc[1900], axis = 0)
.set_index([['diff', 'diff']], append = True)
.swaplevel()
)
pd.concat([df, temp])