Home > Net >  How to calculate the quartile statistics of a column using the groupby function?
How to calculate the quartile statistics of a column using the groupby function?

Time:08-12

I have data in 1 min intervals, and I want to change the granularity to 5 mins, and calculate the basic data statistics using .groupby as such:

   df2 = df1.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                        "value1":  "mean", "value2": "max",
                                        "value3": "quantile"})

I want to get quartile/quantile data as well, but can't assign specific quantile point. The default is 50th quantile. How do I get the 75th quantile for value3?

CodePudding user response:

The values you pass to agg don't have to be strings: they can be other functions. You could define a custom function like

def q75(series):
    return series.quantile(0.75)

and then pass this to agg like

   df2 = df1.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                        "value1":  "mean", "value2": "max",
                                        "value3": q75})

You can even calculate multiple quantities for the same stat by passing them in a list:

df2 = df1.groupby(pd.Grouper(freq='5Min', closed='right', label='right')).agg({
    "value1": "mean", "value2": "max", "value3": [q25, q50, q75]})

CodePudding user response:

You can use groupby.quantile function. You will be able to specify the exact quantile and even choose a type of interpolation. I'm not sure that it is possible to perform everything in one step. May be you may need to do it separately and then append a column with quartiles to a df.

Link to the docs: https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.quantile.html

  • Related