Existing Dataframe :
Id Date_of_activity
A 2020-09-17 12:36:00
A 2020-11-02 00:00:00
A 2020-12-02 00:00:00
A 2021-01-02 00:00:00
A 2021-02-02 00:00:00
A 2021-03-03 12:12:00
A 2021-04-03 12:12:00
B 2020-11-02 00:00:00
B 2021-01-02 00:00:00
B 2021-03-03 12:12:00
B 2021-04-03 12:12:00
Expected Dataframe :
Id Missed_Month_Count
A 1
B 2
I am looking to calculate the Number of Missed Months where NO activity was Done. For Id A , No activity was done in 10th Month of 2020 so the missed month count should be 1 , likewise for B , No activity was done in 12th month of 2020 and 2nd month of 2021 , which makes missed_month_count as 2.
CodePudding user response:
You can use:
# convert to Monthly period
s = pd.to_datetime(df['Date_of_activity']).dt.to_period('M')
# compute the difference per group
# if != 1, then there is a missing month
out = (s.sort_values()
.groupby(df['Id'], sort=False)
.apply(lambda g: g.drop_duplicates().diff().ne('M').sum()-1)
.reset_index(name='Missed_Month_Count')
)
output:
Id Missed_Month_Count
0 A 1
1 B 2