The columns in the below dataset will represent:
A: Date contract opened;
B: Date contract stops;
C: Unique account ID against which contract associated (can have multiple contracts live against one ID)
D: Monthly revenue for contract period - for simplicity, assume revenue generated from first month contract assumed up to month before the date the contract closes
Start Date contract end date Unique Account Field MRR
1/2/2013 1/2/2015 50e55 195.00
1/2/2013 1/2/2014 4ee75 50.00
1/2/2013 1/2/2014 4f031 75.00
1/2/2013 1/2/2016 4c3b2 133.00
1/2/2013 1/2/2016 49ec8 132.00
1/3/2013 1/3/2014 49fc8 59.00
1/4/2013 1/4/2015 49wc8 87.00
12/27/2013 12/27/2014 50bf7 190.00
12/27/2013 12/27/2014 49cc8 179.00
12/27/2013 12/27/2014 49wc8 147.00
etc....
I would like to calculate the following:
1.How much revenue was generated by month between Jan-2013 and Dec-2014?
2.How many active contracts (generated revenue in that month) were there by month between Jan-2013 and Dec-2014?
3.How many active accounts (generated revenue from at least one contract) were there by month between Jan-2013 and Dec-2014?
I tried the below code: I was able to use sum() to get the revenues, but I'm not sure what to do beyond this.
from datetime import date
df['date'] = pd.to_datetime(df['Start Date'])
df.groupby(df['Start Date'].dt.strftime('%B'))['MRR'].sum().sort_values()
Result I got from the above code:
Start Date
February 221744
January 241268
July 245811
August 247413
April 249702
March 251219
June 251494
May 259149
September 263395
October 293990
November 296590
December 311659
I need to calculate the above following. How can I achieve this in python?
CodePudding user response:
Maybe you want something like this?
date_range = (df['date'] >= "2013-01-01") & (df['date'] <= "2014-12-31")
df[date_range].groupby(df['date'].dt.strftime('%B')).agg(
MRR=('MRR', 'sum'),
Contracts=('date', 'count'),
Accounts=('Unique Account Field', 'nunique')
)
Output :-
MRR Contracts Accounts
date
December 516.0 3 3
January 731.0 7 7