Combining continuous dates from two columns in pandas data frame-CodePudding

ID           Order_ID     statr_date            end_date                 Product    Sub_Product 
746              001      08-Oct-2019 0:00:00   *16-Nov-2019 0:00:00*     LPP       Abc
746              002      10-Oct-2019 0:00:00   02-Sep-2020 0:00:00       LPP       Abc
746              003      10-Oct-2019 0:00:00   11-Sep-2020 0:00:00       LPP       Abc
746              004      10-Oct-2019 0:00:00   08-Jan-2021 0:00:00       LPP       Abc
746              005    *16-Nov-2019 0:00:00*   17-Dec-2019 0:00:00       LPP       Abc

In the above dataset i want to search for continuous dates, say from the first row having end date 16-nov-2019, look for a continuous date going by the start date and combine them. Such that the final output looks like:

  ID         Order_ID     statr_date            end_date                 Product    Sub_Product 
746              001      08-Oct-2019 0:00:00   17-Dec-2019 0:00:00       LPP       Abc
746              002      10-Oct-2019 0:00:00   02-Sep-2020 0:00:00       LPP       Abc
746              003      10-Oct-2019 0:00:00   11-Sep-2020 0:00:00       LPP       Abc
746              004      10-Oct-2019 0:00:00   08-Jan-2021 0:00:00       LPP       Abc

How can i go about doing that?

CodePudding user response：

You can take a look at each pairwise combination of rows and then compare the start date of one row with the end date of another row to see if they fit the pattern

import numpy as np
from itertools import combinations

start_dates = pd.to_datetime(df['statr_date'])
end_dates = pd.to_datetime(df['end_date'])
order_ids = df['Order_ID']

dates = [(o, s, e) for o, s, e in zip(order_ids, start_dates, end_dates)]
od_map = dict.fromkeys(order_ids, np.nan)

for combo in combinations(dates, 2):
    ((order_first, start_first, end_first), (order_second, start_second, end_second)) = combo
    if end_first == start_second:
        od_map[order_first] = od_map[order_second] = min(order_first, order_second)

df['od_grp'] = df['Order_ID'].map(od_map).fillna(df['Order_ID'])
df_grp = df.groupby('od_grp').agg({'statr_date': 'min', 'end_date': 'max'})
print(df_grp)
#                 statr_date             end_date
#od_grp                                          
#1.0     08-Oct-2019 0:00:00  17-Dec-2019 0:00:00
#2.0     10-Oct-2019 0:00:00  02-Sep-2020 0:00:00
#3.0     10-Oct-2019 0:00:00  11-Sep-2020 0:00:00
#4.0     10-Oct-2019 0:00:00  08-Jan-2021 0:00:00

CodePudding user response：

You could also try with for-loops and df.loc[]

for start in df.statr_date:
    for end in df.end_date:
        if start==end:
            df.loc[df[df.end_date.eq(end)].index[0], 'end_date'] = df.loc[df[df.statr_date.eq(start)].index[0], 'end_date']
            df = df.drop(df[df.statr_date.eq(start)].index[0])


Output:
    ID  Order_ID           statr_date             end_date Product Sub_Product
0  746         1  08-Oct-2019 0:00:00  17-Dec-2019 0:00:00     LPP         Abc
1  746         2  10-Oct-2019 0:00:00  02-Sep-2020 0:00:00     LPP         Abc
2  746         3  10-Oct-2019 0:00:00  11-Sep-2020 0:00:00     LPP         Abc
3  746         4  10-Oct-2019 0:00:00  08-Jan-2021 0:00:00     LPP         Abc