import pandas as pd
from functools import partial
def maxx(x, y, take_higher):
"""
:param x: some column in the df
:param y: some column in the df
:param take_higher: bool
:return: if take_higher is True: max(max(x), max(y)), else: min(max(x), max(y))
"""
pass
df = pd.DataFrame({'cat': [0, 1, 0, 0, 0, 1, 0, 0, 0, 0], 'x': [10, 15, 5, 11, 0, 4.3, 5.1, 8, 10, 12], 'y': [1, 3, 5, 1, 0, 4.3, 1, 0, 2, 2], 'z': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] })
My purpose is to apply the maxx function to each group (based on cat). It should take BOTH columns x and y as input. I would like to somehow specify the column names that I am going to consider as x and y in the function. I would also like to pass the take_lower parameter (for that purpose, I have imported functools.partial so we can wrap the function and give param). Lastly, I would like to apply that function with both take_higher=True and take_higher=False.
I am trying to do something like :
df.groupby(df.cat).agg(partial(mmax, take_higher=True), partial(mmax, take_higher=False))
but obviously, it does not work. I don't know how to specify which columns should I take into account. How can I do it?
CodePudding user response:
You can use apply
def maxx(gdf,take_higher):
if take_higher:
return(max(max(gdf.x),max(gdf.y)))
else:
return(min(max(gdf.x),max(gdf.y)))
df.groupby(df.cat).apply(lambda g:maxx(g,take_higher=False))
# do both aggregation in one call
df.groupby(df.cat).apply(lambda g:pd.Series({'maxx_min': maxx(g,take_higher=False),'maxx_max' : maxx(g,take_higher=True)}))