Home > other >  Split pandas dataframe date column into start_date & end)date by group
Split pandas dataframe date column into start_date & end)date by group

Time:10-22

I have a dataframe which looks something like this:

S.No  date          origin  dest    journeytype
1     2021-10-21    FKG      HYM    OP
2     2021-10-21    FKG      HYM    PK
3     2021-10-21    HYM      LDS    OP
4     2021-10-22    FKG      HYM    OP
5     2021-10-22    FKG      HYM    PK
6     2021-10-22    HYM      LDS    OP
7     2021-10-23    FKG      HYM    OP
8     2021-10-24    AVM      BLA    OP
9     2021-10-24    AVM      DBL    OP
10    2021-10-27    AVM      BLA    OP

I need to split the individual origin, destination & journeytype into individual start & end_date columns.

Output dataframe for the above input should look like:

start_date  end_date   origin   dest    journeytype
2021-10-21  2021-10-23  FKG     HYM     OP
2021-10-21  2021-10-22  FKG     HYM     PK
2021-10-21  2021-10-22  HYM     LDS     OP
2021-10-24  2021-10-24  AVM     BLA     OP
2021-10-24  2021-10-24  AVM     DBL     OP
2021-10-27  2021-10-27  AVM     BLA     OP

Also if the date for any group is non-continuous they need to be shown as seperate records in the result

CodePudding user response:

Convert column to datetimes if necessary, then aggregate min and max by GroupBy.agg adn last change order of columns by list:

df['date'] = pd.to_datetime(df['date'])

df = (df.groupby(['origin','dest','journeytype'], sort=False)['date']
        .agg(start_date='min', end_date='max')
        .reset_index())

df = df[['start_date', 'end_date','origin', 'dest', 'journeytype']]
print (df)
  start_date   end_date origin dest journeytype
0 2021-10-21 2021-10-23    FKG  HYM          OP
1 2021-10-21 2021-10-22    FKG  HYM          PK
2 2021-10-21 2021-10-22    HYM  LDS          OP
3 2021-10-24 2021-10-24    AVM  BLA          OP
4 2021-10-24 2021-10-24    AVM  DBL          OP
5 2021-10-24 2021-10-24    AVM  DKD          OP
  • Related