So I have a dataset that looks like this:
class date
A 2018-01-01
B 2018-03-05
A 2018-01-03
A 2018-01-05
B 2018-03-10
A 2018-01-07
I wish to calculate the mean difference between the dates for each class using Pandas, for example, for Class A, we have:
2018-01-01, 2018-01-03, 2018-01-05 and 2018-01-07
The diff between each of these dates is 2 days, so the mean is also 2.
What I expect to get is a grouped dataframe, like the following:
class mean
A 2
B 5
I have tried df.groupby('class')['date'].diff().fillna(pd.Timedelta(seconds=0)).mean()
, but it doesn't return the expected output.
CodePudding user response:
You can try something like this:
df.groupby('class', as_index=False)['date']\
.apply(lambda x: x.diff().mean())
Output:
class date
0 A 2 days
1 B 5 days