For pandas agg
, is there a way to specify the aggregation function based on the data type? For example, all columns of type object get "first", all floats get "mean", and so on? So as to avoid having to type out all the columns with their respective aggregating functions.
Sample data:
import seaborn as sns
iris = sns.load_dataset('iris')
Desired code:
iris.agg({"object":"first", "float":"mean"})
CodePudding user response:
I have found a solution
def aggMe(x):
if x.dtype.kind=="i":
y=x.median()
elif x.dtype.kind=="f":
y=x.mean()
elif x.dtype.kind=="O":
y=x.head(1)
else:
y=np.nan
return y
iris.agg(aggMe)
resulting in
sepal_length 5.84333
sepal_width 3.05733
petal_length 3.758
petal_width 1.19933
species 0 setosa
Name: species, dtype: object
dtype: object
CodePudding user response:
def a(x):
if x.dtype == np.dtype('float64'):
dict[x.name] = "mean"
elif x.dtype == np.dtype('object'):
dict[x.name] = "first"
dict = {}
df = df.apply(lambda x: a(x))
iris.agg(dict)
CodePudding user response:
I would do:
import seaborn as sns
iris = sns.load_dataset('iris')
agg_method = {'float64': 'mean', 'object': 'count'}
iris.agg({k: agg_method[str(v)] for k, v in iris.dtypes.items()})
Returns:
sepal_length 5.843333
sepal_width 3.057333
petal_length 3.758000
petal_width 1.199333
species 150.000000
dtype: float64