Home > Software design >  Aggregate DataFrame down to one row using different functions
Aggregate DataFrame down to one row using different functions

Time:12-22

I have a dataframe with multiple rows, which I'd like to aggregate down, per-column, to a 1-row dataframe, using a different function per-column.

Take the following dataframe, as an example:

df = pd.DataFrame([[1,2], [2,3]], columns=['A', 'B'])
print(df)

Result:

   A  B
0  1  2
1  2  3

I'd like to aggregate the first column using sum and the second using mean. There is a convenient DataFrame.agg() method which can take a map of column names to aggregation function, like so:

aggfns = {
    'A': 'sum',
    'B': 'mean'
}
print(df.agg(aggfns))

However, this results in a Series rather than a DataFrame:

A    3.0
B    2.5
dtype: float64

Among other problems, a series has a single dtype so loses the per-column datatype. A series is well-suited to represent a single dataframe column, but not a single dataframe row.

I managed to come up with this tortured incantation:

df['dummy'] = 0
dfa = df.groupby('dummy').agg(aggfns).reset_index(drop=True)
print(dfa)

This creates a dummy column which is 0 everywhere, groups on it, does the aggregation and drops it. Certainly there is something better?

CodePudding user response:

Using Series.to_frame DataFrame.T (short for transpose):

dfa = df.agg(aggfns).to_frame().T

Output:

>>> dfa
     A    B
0  3.0  2.5

CodePudding user response:

You could group by an empty Series instead of creating a new column:

dfa = df.assign(d=0).groupby('d').agg(aggfns).reset_index(drop=True)

Output:

>>> dfa
   A    B
0  3  2.5

CodePudding user response:

You can explicitly create a new DataFrame()

>>> pd.DataFrame({'A': [df.A.sum()], 'B': [df.B.mean()]}
   A    B
0  3  2.5
  • Related