I'm confused about the behavior of groupby when we give axis=1. E.g.
from pandas import DataFrame
junk_df = DataFrame(data = {"c1": [1,1,2,2,3,3],
"c2": [1,1,3,3,5,5],
"c3": [2,2,3,3,4,4]},
index=["r1", "r2", "r3", "r3", "r5", "r6"])
print(junk_df)
c1 c2 c3
r1 1 1 2
r2 1 1 2
r3 2 3 3
r3 2 3 3
r5 3 5 4
r6 3 5 4
now a groupby with axis=0
print(junk_df.groupby("c1", axis=0).mean())
gives the expected aggregations of rows by values in c1
c2 c3
c1
1 1 2
2 3 3
3 5 4
But switching the groupby to aggregate over columns instead of rows seems to raise KeyError
.
print(junk_df.groupby("r1", axis=1).mean())
gives
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-30-f6b95242fc39> in <module>
----> 1 print(junk_df.groupby("r1", axis=1).mean())
~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna)
6509 axis = self._get_axis_number(axis)
6510
-> 6511 return DataFrameGroupBy(
6512 obj=self,
6513 keys=by,
~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated, dropna)
523 from pandas.core.groupby.grouper import get_grouper
524
--> 525 grouper, exclusions, obj = get_grouper(
526 obj,
527 keys,
~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna)
779 in_axis, name, level, gpr = False, None, gpr, None
780 else:
--> 781 raise KeyError(gpr)
782 elif isinstance(gpr, Grouper) and gpr.key is not None:
783 # Add key to exclusions
KeyError: 'r1'
Does anyone understand how to aggregate columns by values in a row?
Thanks!
CodePudding user response:
axis=1
refers to columns. r1
is not in the columns list, hence the KeyError
.
You can use junk_df.T
which transposes (basically rotates 90 degrees) the dataframe, so the columns become the rows and the rows become the columns. Then groupby("r1")
will work.
>>> junk_df.T.groupby("r1", as_index=False).mean()
r1 r2 r3 r3 r5 r6
0 1 1.0 2.5 2.5 4.0 4.0
1 2 2.0 3.0 3.0 4.0 4.0
And then you can flip it back with another .T
:
>>> junk_df.T.groupby("r1", as_index=False).mean().T
0 1
r1 1.0 2.0
r2 1.0 2.0
r3 2.5 3.0
r3 2.5 3.0
r5 4.0 4.0
r6 4.0 4.0
CodePudding user response:
One approach:
res = junk_df.groupby(junk_df.loc["r1"], axis=1).mean()
print(res)
Output
r1 1 2
r1 1.0 2.0
r2 1.0 2.0
r3 2.5 3.0
r3 2.5 3.0
r5 4.0 4.0
r6 4.0 4.0