Pandas agg define metric based on data type-CodePudding

For pandas agg, is there a way to specify the aggregation function based on the data type? For example, all columns of type object get "first", all floats get "mean", and so on? So as to avoid having to type out all the columns with their respective aggregating functions.

Sample data:

import seaborn as sns
iris = sns.load_dataset('iris')

Desired code:

iris.agg({"object":"first", "float":"mean"})

CodePudding user response：

I have found a solution

def aggMe(x):
    
    if x.dtype.kind=="i":
        y=x.median()
    elif x.dtype.kind=="f":
        y=x.mean()
    elif x.dtype.kind=="O":
        y=x.head(1)
    else:
        y=np.nan
    
    return y

iris.agg(aggMe)

resulting in

sepal_length                                     5.84333
sepal_width                                      3.05733
petal_length                                       3.758
petal_width                                      1.19933
species         0    setosa
Name: species, dtype: object
dtype: object

CodePudding user response：

def a(x):
    if x.dtype == np.dtype('float64'):
        dict[x.name] = "mean"
    elif x.dtype == np.dtype('object'):
        dict[x.name] = "first"


dict = {}

df = df.apply(lambda x: a(x))

iris.agg(dict)

CodePudding user response：

I would do:

import seaborn as sns
iris = sns.load_dataset('iris')

agg_method = {'float64': 'mean', 'object':  'count'}

iris.agg({k: agg_method[str(v)] for k, v in iris.dtypes.items()})

Returns:

sepal_length      5.843333
sepal_width       3.057333
petal_length      3.758000
petal_width       1.199333
species         150.000000
dtype: float64