I have data in 1 min intervals, and I want to change the granularity to 5 mins, and calculate the basic data statistics using .groupby as such:
df2 = df1.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
"value1": "mean", "value2": "max",
"value3": "quantile"})
I want to get quartile/quantile data as well, but can't assign specific quantile point. The default is 50th quantile. How do I get the 75th quantile for value3?
CodePudding user response:
The values you pass to agg
don't have to be strings: they can be other functions.
You could define a custom function like
def q75(series):
return series.quantile(0.75)
and then pass this to agg
like
df2 = df1.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
"value1": "mean", "value2": "max",
"value3": q75})
You can even calculate multiple quantities for the same stat by passing them in a list:
df2 = df1.groupby(pd.Grouper(freq='5Min', closed='right', label='right')).agg({
"value1": "mean", "value2": "max", "value3": [q25, q50, q75]})
CodePudding user response:
You can use groupby.quantile
function. You will be able to specify the exact quantile and even choose a type of interpolation. I'm not sure that it is possible to perform everything in one step. May be you may need to do it separately and then append a column with quartiles to a df.
Link to the docs: https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.quantile.html