I have a dataframe with multiple rows, which I'd like to aggregate down, per-column, to a 1-row dataframe, using a different function per-column.
Take the following dataframe, as an example:
df = pd.DataFrame([[1,2], [2,3]], columns=['A', 'B'])
print(df)
Result:
A B
0 1 2
1 2 3
I'd like to aggregate the first column using sum
and the second using mean
. There is a convenient DataFrame.agg()
method which can take a map of column names to aggregation function, like so:
aggfns = {
'A': 'sum',
'B': 'mean'
}
print(df.agg(aggfns))
However, this results in a Series
rather than a DataFrame:
A 3.0
B 2.5
dtype: float64
Among other problems, a series has a single dtype
so loses the per-column datatype. A series is well-suited to represent a single dataframe column, but not a single dataframe row.
I managed to come up with this tortured incantation:
df['dummy'] = 0
dfa = df.groupby('dummy').agg(aggfns).reset_index(drop=True)
print(dfa)
This creates a dummy column which is 0 everywhere, groups on it, does the aggregation and drops it. Certainly there is something better?
CodePudding user response:
Using Series.to_frame
DataFrame.T
(short for transpose
):
dfa = df.agg(aggfns).to_frame().T
Output:
>>> dfa
A B
0 3.0 2.5
CodePudding user response:
You could group by an empty Series instead of creating a new column:
dfa = df.assign(d=0).groupby('d').agg(aggfns).reset_index(drop=True)
Output:
>>> dfa
A B
0 3 2.5
CodePudding user response:
You can explicitly create a new DataFrame()
>>> pd.DataFrame({'A': [df.A.sum()], 'B': [df.B.mean()]}
A B
0 3 2.5