Getting days difference from dates in same column in a groupby in Pandas-CodePudding

I have a pandas dataframe like this:

user_id	code	earliest_date
A	N18.1	2016-04-08
A	N18.2	2017-06-08
A	N18.3	2018-04-08
B	N18.1	2016-06-16
B	N18.2	2016-08-16
C	N18.1	2021-01-11

In the code columns the values are always form a specified list, here [N18.1,N18.2,N18.3] etc.

I want a dataframe that displays the days it takes to progress from code N18.x to the next one in sequence for each user. I am thinking something like this:

user_id	N18.1 -> N18.2	N18.2 -> N18.3
A	426	304
B	60	n/a
C	n/a	n/a

Your help will be much appreciated.

CodePudding user response：

Use DataFrameGroupBy.diff for differecies per groups, create helper column for shifted code and pass to DataFrame.pivot:

df['earliest_date'] = pd.to_datetime(df['earliest_date'])

df['diff'] = df.groupby('user_id')['earliest_date'].diff().dt.days
df['code1'] = df.groupby('user_id')['code'].shift()   '->'   df['code']

df = df.pivot('user_id','code1','diff').drop(np.nan, axis=1)
print (df)
code1    N18.1->N18.2  N18.2->N18.3
user_id                            
A               426.0         304.0
B                61.0           NaN
C                 NaN           NaN