Home > database >  Getting days difference from dates in same column in a groupby in Pandas
Getting days difference from dates in same column in a groupby in Pandas

Time:12-11

I have a pandas dataframe like this:

user_id code earliest_date
A N18.1 2016-04-08
A N18.2 2017-06-08
A N18.3 2018-04-08
B N18.1 2016-06-16
B N18.2 2016-08-16
C N18.1 2021-01-11

In the code columns the values are always form a specified list, here [N18.1,N18.2,N18.3] etc.

I want a dataframe that displays the days it takes to progress from code N18.x to the next one in sequence for each user. I am thinking something like this:

user_id N18.1 -> N18.2 N18.2 -> N18.3
A 426 304
B 60 n/a
C n/a n/a

Your help will be much appreciated.

CodePudding user response:

Use DataFrameGroupBy.diff for differecies per groups, create helper column for shifted code and pass to DataFrame.pivot:

df['earliest_date'] = pd.to_datetime(df['earliest_date'])

df['diff'] = df.groupby('user_id')['earliest_date'].diff().dt.days
df['code1'] = df.groupby('user_id')['code'].shift()   '->'   df['code']

df = df.pivot('user_id','code1','diff').drop(np.nan, axis=1)
print (df)
code1    N18.1->N18.2  N18.2->N18.3
user_id                            
A               426.0         304.0
B                61.0           NaN
C                 NaN           NaN
  • Related