Home > database >  Pandas dataframe consider multiple columns to an aggregate function for each group
Pandas dataframe consider multiple columns to an aggregate function for each group

Time:09-14

import pandas as pd
from functools import partial

def maxx(x, y, take_higher):
    """
    
    :param x: some column in the df
    :param y: some column in the df
    :param take_higher: bool
    :return: if take_higher is True: max(max(x), max(y)), else: min(max(x), max(y))
    """
    pass
df = pd.DataFrame({'cat': [0, 1, 0, 0, 0, 1, 0, 0, 0, 0], 'x': [10, 15, 5, 11, 0, 4.3, 5.1, 8, 10, 12], 'y': [1, 3, 5, 1, 0, 4.3, 1, 0, 2, 2], 'z': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] })

My purpose is to apply the maxx function to each group (based on cat). It should take BOTH columns x and y as input. I would like to somehow specify the column names that I am going to consider as x and y in the function. I would also like to pass the take_lower parameter (for that purpose, I have imported functools.partial so we can wrap the function and give param). Lastly, I would like to apply that function with both take_higher=True and take_higher=False.

I am trying to do something like :

df.groupby(df.cat).agg(partial(mmax, take_higher=True), partial(mmax, take_higher=False))

but obviously, it does not work. I don't know how to specify which columns should I take into account. How can I do it?

CodePudding user response:

You can use apply

def maxx(gdf,take_higher):
    if take_higher:
        return(max(max(gdf.x),max(gdf.y)))
    else:
        return(min(max(gdf.x),max(gdf.y)))
        

df.groupby(df.cat).apply(lambda g:maxx(g,take_higher=False))

# do both aggregation in one call
df.groupby(df.cat).apply(lambda g:pd.Series({'maxx_min': maxx(g,take_higher=False),'maxx_max' : maxx(g,take_higher=True)}))
  • Related