I have the following df
:
DateTime Var1
0 2021-08-01 10:00:00 115.0
1 2021-08-01 11:00:00 99.0
2 2021-08-01 12:00:00 155.0
3 2021-08-01 13:00:00 73.0
4 2021-08-01 14:00:00 44.0
5 2021-08-02 10:00:00 112.0
6 2021-08-02 11:00:00 100.0
7 2021-08-02 12:00:00 150.0
8 2021-08-02 13:00:00 70.0
9 2021-08-02 14:00:00 45.0
I need to group the data by date (not date and time), and apply my_group
formula to each date.
This is how I did it:
def my_group(group):
if len(group) == 0:
return np.nan
return group["Var1"]/len(group)
result = (
df[["DateTime","Var1"]]
.assign(date=lambda x: x["DateTime"].dt.date)
.groupby("date")
.apply(my_group)
.reset_index()
)
result.head()
But instead of grouping by date, the records seem to group differently, because I see the same date duplicated in the result
(the Var1
values come from my original df
):
date level_1 Var1
0 2021-08-01 0 0.016767
1 2021-08-01 1 0.014398
CodePudding user response:
Output is expected, because:
return group["Var1"]/len(group)
return Series like original DataFrame.
Need aggregation, e.g. sum
:
return group["Var1"].sum()/len(group)
what is same like:
return group["Var1"].mean()
I think length of group is never 0
here, so solution should be simplify:
result = (
df.groupby(df["DateTime"].dt.date.rename('date'))["Var1"].mean().reset_index()
)
print (result)
date Var1
0 2021-08-01 97.2
1 2021-08-02 95.4
Another solution with Grouper
:
result = (
df.groupby(pd.Grouper(freq='d', key="DateTime"))["Var1"]
.mean()
.rename_axis('date')
.reset_index()
)