I have been presented with a very small dataset that has the date
of each time a user
logs into a system, I have to use this data set to create a table where I can show for each log-in the cumulative monthly counts of logs and the overall cumulative counts of logs, this is the data set I have:
date | user |
---|---|
1/01/2022 | Mark |
2/01/2022 | Mark |
3/02/2022 | Mark |
4/02/2022 | Mark |
5/03/2022 | Mark |
6/03/2022 | Mark |
7/03/2022 | Mark |
8/03/2022 | Mark |
9/03/2022 | Mark |
and this is my desired output:
row | date | user | monthly_track | acum_track |
---|---|---|---|---|
1 | 1/01/2022 | Mark | 1 | 1 |
2 | 2/01/2022 | Mark | 2 | 2 |
3 | 3/02/2022 | Mark | 1 | 3 |
4 | 4/02/2022 | Mark | 2 | 4 |
5 | 5/03/2022 | Mark | 1 | 5 |
6 | 6/03/2022 | Mark | 2 | 6 |
7 | 7/03/2022 | Mark | 3 | 7 |
8 | 8/03/2022 | Mark | 4 | 8 |
9 | 9/03/2022 | Mark | 5 | 9 |
Why? Let's look at the row number 5. This is the first time the user Mark has logged into the system during the month 3 (March) but it is the 5th overall login in the data set (for the purpose of learning there will only be one year (2022).
I have no idea as to how to get the monthly and overall count together. I can groupby
user
and sort
by date
to count how many times in total a user has logged in, but I know that in order to achive my desired output I will have to group by date
and user
and then make counts based on month but I will have to somehow group the data by user (only) to get the overall count and I dont think I could group twice the data.
CodePudding user response:
First you need to convert date
to actual datetime
values with to_datetime
. The rest is simple with groupby
and cumcount
:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df['monthly_count'] = df.groupby([df['user'], df['date'].dt.year, df['date'].dt.month]).cumcount() 1
df['acum_count'] = df.groupby('user').cumcount() 1
Output:
>>> df
date user monthly_count acum_count
0 2022-01-01 Mark 1 1
1 2022-01-02 Mark 2 2
2 2022-02-03 Mark 1 3
3 2022-02-04 Mark 2 4
4 2022-03-05 Mark 1 5
5 2022-03-06 Mark 2 6
6 2022-03-07 Mark 3 7
7 2022-03-08 Mark 4 8
8 2022-03-09 Mark 5 9