I have a data frame of users and each time they entered a website, it looks like this:
(if there are x row with same week and date it means the user entered the site x time that date).
ID | week | date |
---|---|---|
1 | 2 | 20/07/21 |
2 | 3 | 23/07/21 |
2 | 3 | 23/07/21 |
2 | 3 | 26/07/21 |
2 | 4 | 30/07/21 |
2 | 4 | 30/07/21 |
2 | 4 | 30/07/21 |
2 | 4 | 31/07/21 |
so far I've managed to do this:
ID | week | date | days number |
---|---|---|---|
1 | 2 | 20/07/21 | 1 |
2 | 3 | 23/07/21 | 2 |
2 | 3 | 26/07/21 | 1 |
2 | 4 | 30/07/21 | 3 |
2 | 4 | 31/07/21 | 1 |
using this code:
df.groupby(['ID','week','date']).agg({'date':['count']})
but I need to calculate the mean times each user used the site by week, so each user has a row for each week. Therefor the output I need looks like this:
ID | week | mean days number |
---|---|---|
1 | 2 | 1 |
2 | 3 | 1.5 |
2 | 4 | 2 |
Any ideas how to continue?
Thanks!!
CodePudding user response:
Use:
(df.groupby(['ID', 'week', 'date'], as_index=False)['date']
.agg('count')
.groupby(['ID', 'week'], as_index=False)
.agg(**{'mean days number': ('date', 'mean')})
)
Output:
ID week mean days number
0 1 2 1.0
1 2 3 1.5
2 2 4 2.0