Assume I have a following data:
id date value
1 2020-01-22 20
1 2020-03-12 18
1 2020-03-25 16
2 2020-04-22 20
2 2020-04-22 23
First I wish group by id
and date
and sum values for distinct dates. Then, I want to group by id
to sum the total value and divide by the count of distinct months from date
.
The first part is easy. I can simply do: df.groupby(["id", "date"]).sum()
. I then get the following:
value
id date
1 2020-01-22 20
1 2020-03-12 18
1 2020-03-25 16
2 2020-04-22 43
But I do not only want to get the aggregate but the sum being divided by the number of unique months in the date
. My idea for counting the unique months would be: len(pd.to_datetime(df["date"]).dt.to_period('M').unique())
. However, I have no idea how to combine the two together.
Basically, the output I'm looking for is:
id value_after_division
1 27
2 43
In simpler terms: 27=(20 18 16)/2 and 43=(43)/1.
CodePudding user response:
You'd like to aggregate "number of unique months" and "total value", and divide them. You already had the latter part. For the former, if only we had a (temporary) column indicating the month. So we go:
# get hold on a grouper object after making month available
g = df.assign(month=df.date.dt.month).groupby("id")
# aggregate
nuniq_mon = g["month"].nunique()
total_val = g["value"].sum()
# div is method way of /
result = total_val.div(nuniq_mon)
to get
>>> result
id
1 27.0
2 43.0
dtype: float64