I have a data frame that looks something like:
df =
date col1 col2 col3
---------------------------------------
2022/03/01 1 5 10
2022/03/01 3 6 12
2022/03/01 5 7 14
2022/03/02 6 8 15
2022/03/02 2 9 17
2022/03/02 8 10 19
2022/03/03 2 11 21
2022/03/03 10 12 22
2022/03/03 9 13 23
I then have a function that looks something like:
my_func(df):
<do something with the `df` given to the function>
return result
So in my case, the result is just a single float
calculated from doing several things to the data frame used.
What I would like to do is to groupby
the date in the original data frame, and then use those group objects as input in the function, and the returning the calculated value for all rows, i.e. the resulting data frames would look something like:
df_group_object1 =
date col1 col2 col3 result
--------------------------------------------------
2022/03/01 1 5 10 15
2022/03/01 3 6 12 15
2022/03/01 5 7 14 15
df_group_object2 =
date col1 col2 col3 result
--------------------------------------------------
2022/03/02 6 8 15 25
2022/03/02 2 9 17 25
2022/03/02 8 10 19 25
df_group_object3 =
date col1 col2 col3 result
--------------------------------------------------
2022/03/03 2 11 21 56
2022/03/03 10 12 22 56
2022/03/03 9 13 23 56
Where the result
column is just random values that I put in. The real value would come from the my_func
.
My idea was to do something like:
df["result"] = df.groupby(["date"]).transform(my_func)
But it seems like the groupby
object I thought would be give to the function is not the entire data frame at all.
So is there a way to do this ?
CodePudding user response:
Assuming you want to do operations on the grouped DataFrames and then collect the results, you could just use a for loop on the groupby object:
import pandas as pd
df = pd.DataFrame({'col1':[1,1,2,2,3], 'col2':[1,2,3,4,5]})
def my_func(df):
return df['col2'] 1
# let's say you want to groupby col1 and operate on the rest of the columns
group_object = []
for group_name, df_chunk in df.groupby('col1'):
df_chunk['result'] = my_func(df_chunk)
group_object.append(df_chunk)
group_object[0]
:
col1 col2 result
0 1 1 2
1 1 2 3