Group by, summarise and divide by the number of distinct months using Python-CodePudding

Assume I have a following data:

id  date        value
1   2020-01-22  20
1   2020-03-12  18
1   2020-03-25  16
2   2020-04-22  20
2   2020-04-22  23

First I wish group by id and date and sum values for distinct dates. Then, I want to group by id to sum the total value and divide by the count of distinct months from date.

The first part is easy. I can simply do: df.groupby(["id", "date"]).sum(). I then get the following:

                value
id  date        
1   2020-01-22  20
1   2020-03-12  18
1   2020-03-25  16
2   2020-04-22  43

But I do not only want to get the aggregate but the sum being divided by the number of unique months in the date. My idea for counting the unique months would be: len(pd.to_datetime(df["date"]).dt.to_period('M').unique()). However, I have no idea how to combine the two together.

Basically, the output I'm looking for is:

id  value_after_division
1   27
2   43

In simpler terms: 27=(20 18 16)/2 and 43=(43)/1.

CodePudding user response：

You'd like to aggregate "number of unique months" and "total value", and divide them. You already had the latter part. For the former, if only we had a (temporary) column indicating the month. So we go:

# get hold on a grouper object after making month available
g = df.assign(month=df.date.dt.month).groupby("id")

# aggregate
nuniq_mon = g["month"].nunique()
total_val = g["value"].sum()

# div is method way of /
result = total_val.div(nuniq_mon)

to get

>>> result

id
1    27.0
2    43.0
dtype: float64