Home > front end >  Use entire groupby object on custom function
Use entire groupby object on custom function

Time:03-10

I have a data frame that looks something like:

df =

date           col1      col2      col3
---------------------------------------
2022/03/01     1         5         10
2022/03/01     3         6         12
2022/03/01     5         7         14
2022/03/02     6         8         15
2022/03/02     2         9         17
2022/03/02     8         10        19
2022/03/03     2         11        21
2022/03/03     10        12        22
2022/03/03     9         13        23

I then have a function that looks something like:

my_func(df):
    <do something with the `df` given to the function>

    return result

So in my case, the result is just a single float calculated from doing several things to the data frame used.

What I would like to do is to groupby the date in the original data frame, and then use those group objects as input in the function, and the returning the calculated value for all rows, i.e. the resulting data frames would look something like:

df_group_object1 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/01     1         5         10       15
2022/03/01     3         6         12       15
2022/03/01     5         7         14       15


df_group_object2 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/02     6         8         15       25
2022/03/02     2         9         17       25
2022/03/02     8         10        19       25


df_group_object3 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/03     2         11        21       56
2022/03/03     10        12        22       56
2022/03/03     9         13        23       56

Where the result column is just random values that I put in. The real value would come from the my_func.

My idea was to do something like:

df["result"] = df.groupby(["date"]).transform(my_func)

But it seems like the groupby object I thought would be give to the function is not the entire data frame at all.

So is there a way to do this ?

CodePudding user response:

Assuming you want to do operations on the grouped DataFrames and then collect the results, you could just use a for loop on the groupby object:

import pandas as pd

df = pd.DataFrame({'col1':[1,1,2,2,3], 'col2':[1,2,3,4,5]})
def my_func(df):
    return df['col2']   1

# let's say you want to groupby col1 and operate on the rest of the columns
group_object = []
for group_name, df_chunk in df.groupby('col1'):
    df_chunk['result'] = my_func(df_chunk)
    group_object.append(df_chunk)

group_object[0]:

    col1    col2    result
0   1       1       2
1   1       2       3
  • Related