I have a df
:
df_test = sns.load_dataset("flights")
df_test['cat_2'] = np.random.choice(range(10), df_test.shape[0])
df_test.pivot_table(index='month',
columns='year',
values=['passengers', 'cat_2'])\
.swaplevel(0,1, axis=1)\
.sort_index(axis=1, level=0)\
.fillna(0)
I am trying to calculate the difference between cat_2
and passengers
each year
compared to the year before.
What is the best way to achieve this? Desired output would look similar to this:
year 1949 1950 1951
cat_2 passengers % diff cat_2 passengers % diff cat_2 passengers % diff
month
Jan 6 112 0 6 115 115/112 6 90 90/115
Feb 0 118 0 6 126 126/118 6 150 150 / 126
Mar 2 132 0 7 141 7 141
Apr 0 129 0 9 135 9 135
May 5 121 0 4 125 4 125
Jun 1 135 0 3 149 3 149
Jul 6 148 0 5 170 5 170
Aug 5 148 0 2 170 2 170
Sep 1 136 0 4 158 4 158
Oct 5 119 0 5 133 5 133
Nov 0 104 0 1 114 1 114
Dec 7 118 0 1 140 1 140
I only showed the desired calculations for columns passengers
but the same calculations method can be used for cat_2
as well.
As there is nothing to compare the first year I filled the values with 0
.
CodePudding user response:
You can select columns passengers
by DataFrame.xs
, divide by shifted rows by DataFrame.shift
, create MultiIndex
and append to original DataFrame
:
df_test = sns.load_dataset("flights")
df_test['cat_2'] = np.random.choice(range(10), df_test.shape[0])
df = df_test.pivot_table(index='month',
columns='year',
values=['passengers', 'cat_2'])\
.swaplevel(0,1, axis=1)\
.sort_index(axis=1, level=0)\
.fillna(0)
df1 = df.xs('passengers', axis=1, level=1)
df2 = pd.concat({'% diff': df1.div(df1.shift(axis=1)).fillna(0)}, axis=1)
out = (pd.concat([df, df2.swaplevel(axis=1)], axis=1)
.sort_index(axis=1, level=0, sort_remaining=False))
Another idea is shift values by add 1
year, only necessary remove last NaN
column by iloc
:
df1 = df.xs('passengers', axis=1, level=1)
df2 = (pd.concat({'% diff': df1.div(df1.rename(columns=lambda x: x 1))}, axis=1)
.fillna(0)
.iloc[:, :-1])
out = (pd.concat([df, df2.swaplevel(axis=1)], axis=1)
.sort_index(axis=1, level=0, sort_remaining=False))