Is there a function in Pandas pivot table to add the difference of multiple columns?-CodePudding

I have the following pandas DataFrame:

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar",'foo' ],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two", 'two'],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large", 'large'],
                   "D": [1, 2, 2, 3, 3, 4, 5, 6, 7,8],
               })

with the following output:

print(df)

    A   B   C       D
0   foo one small   1
1   foo one large   2
2   foo one large   2
3   foo two small   3
4   foo two small   3
5   bar one large   4
6   bar one small   5
7   bar two small   6
8   bar two large   7
9   foo two large   8

then I am doing a pivot table as follows:

table = pd.pivot_table(df, values='D', index=['A'],
                    columns=['B','C'])

With the following output:

print(table)

B   one             two
C   large   small   large   small
A               
bar   4      5       7        6
foo   2      1       8        3

How could I add the difference between large and small (large - small) for one and two (diff in table below)? The expected output would be:

B   one                 two
C   large   small diff  large   small difff
A               
bar   4        5   -1     7       6    1
foo   2        1    1     8       3    5

I saw some previous answers but only treated 1 column. Also, ideally would be done using the aggfunc

Additionally, how would be the way to re-transform the table into the initial format? Expected output would be:

  A   B   C     D 
0  foo one small 1 
1  foo one large 2 
2  foo one large 2 
3  foo two small 3 
4  foo two small 3 
5  bar one large 4 
6  bar one small 5 
7  bar two small 6 
8  bar two large 7 
9  foo two large 8 
10 bar one diff -1 
11 bar two diff 1 
12 foo one diff 1 
13 foo two diff 5

Thanks in advance for help!

CodePudding user response：

diffs = (table.groupby(level="B", axis="columns")
              .diff(-1).dropna(axis="columns")
              .rename(columns={"large": "diff"}, level="C"))

new = table.join(diffs).loc[:, table.columns.get_level_values("B").unique()]

groupby the level "B" of columns ("one", "two"...)
take difference from left to right (diff(-1))
- i.e., compute "large - small" values
since there's nothing next to small further, it will be all NaNs, drop it
rename the "large"s which actually now hold the differences
join with the pivoted table and restore the "one", "two" original order

to get

>>> new

B     one              two
C   large small diff large small diff
A
bar     4     5   -1     7     6    1
foo     2     1    1     8     3    5