pandas aggregating columns with groupby axis=1 gives KeyError-CodePudding

I'm confused about the behavior of groupby when we give axis=1. E.g.

from pandas import DataFrame
junk_df = DataFrame(data = {"c1": [1,1,2,2,3,3], 
                            "c2": [1,1,3,3,5,5], 
                            "c3": [2,2,3,3,4,4]}, 
                            index=["r1", "r2", "r3", "r3", "r5", "r6"])
print(junk_df)

    c1  c2  c3
r1   1   1   2
r2   1   1   2
r3   2   3   3
r3   2   3   3
r5   3   5   4
r6   3   5   4

now a groupby with axis=0

print(junk_df.groupby("c1", axis=0).mean())

gives the expected aggregations of rows by values in c1

But switching the groupby to aggregate over columns instead of rows seems to raise KeyError.

print(junk_df.groupby("r1", axis=1).mean())

gives

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-30-f6b95242fc39> in <module>
----> 1 print(junk_df.groupby("r1", axis=1).mean())

~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna)
   6509         axis = self._get_axis_number(axis)
   6510 
-> 6511         return DataFrameGroupBy(
   6512             obj=self,
   6513             keys=by,

~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated, dropna)
    523             from pandas.core.groupby.grouper import get_grouper
    524 
--> 525             grouper, exclusions, obj = get_grouper(
    526                 obj,
    527                 keys,

~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna)
    779                 in_axis, name, level, gpr = False, None, gpr, None
    780             else:
--> 781                 raise KeyError(gpr)
    782         elif isinstance(gpr, Grouper) and gpr.key is not None:
    783             # Add key to exclusions

KeyError: 'r1'

Does anyone understand how to aggregate columns by values in a row?

Thanks!

CodePudding user response：

axis=1 refers to columns. r1 is not in the columns list, hence the KeyError.

You can use junk_df.T which transposes (basically rotates 90 degrees) the dataframe, so the columns become the rows and the rows become the columns. Then groupby("r1") will work.

>>> junk_df.T.groupby("r1", as_index=False).mean()
   r1   r2   r3   r3   r5   r6
0   1  1.0  2.5  2.5  4.0  4.0
1   2  2.0  3.0  3.0  4.0  4.0

And then you can flip it back with another .T:

>>> junk_df.T.groupby("r1", as_index=False).mean().T
      0    1
r1  1.0  2.0
r2  1.0  2.0
r3  2.5  3.0
r3  2.5  3.0
r5  4.0  4.0
r6  4.0  4.0

CodePudding user response：

One approach:

res = junk_df.groupby(junk_df.loc["r1"], axis=1).mean()
print(res)

Output

r1    1    2
r1  1.0  2.0
r2  1.0  2.0
r3  2.5  3.0
r3  2.5  3.0
r5  4.0  4.0
r6  4.0  4.0