ID Order_ID statr_date end_date Product Sub_Product
746 001 08-Oct-2019 0:00:00 *16-Nov-2019 0:00:00* LPP Abc
746 002 10-Oct-2019 0:00:00 02-Sep-2020 0:00:00 LPP Abc
746 003 10-Oct-2019 0:00:00 11-Sep-2020 0:00:00 LPP Abc
746 004 10-Oct-2019 0:00:00 08-Jan-2021 0:00:00 LPP Abc
746 005 *16-Nov-2019 0:00:00* 17-Dec-2019 0:00:00 LPP Abc
In the above dataset i want to search for continuous dates, say from the first row having end date 16-nov-2019, look for a continuous date going by the start date and combine them. Such that the final output looks like:
ID Order_ID statr_date end_date Product Sub_Product
746 001 08-Oct-2019 0:00:00 17-Dec-2019 0:00:00 LPP Abc
746 002 10-Oct-2019 0:00:00 02-Sep-2020 0:00:00 LPP Abc
746 003 10-Oct-2019 0:00:00 11-Sep-2020 0:00:00 LPP Abc
746 004 10-Oct-2019 0:00:00 08-Jan-2021 0:00:00 LPP Abc
How can i go about doing that?
CodePudding user response:
You can take a look at each pairwise combination of rows and then compare the start date of one row with the end date of another row to see if they fit the pattern
import numpy as np
from itertools import combinations
start_dates = pd.to_datetime(df['statr_date'])
end_dates = pd.to_datetime(df['end_date'])
order_ids = df['Order_ID']
dates = [(o, s, e) for o, s, e in zip(order_ids, start_dates, end_dates)]
od_map = dict.fromkeys(order_ids, np.nan)
for combo in combinations(dates, 2):
((order_first, start_first, end_first), (order_second, start_second, end_second)) = combo
if end_first == start_second:
od_map[order_first] = od_map[order_second] = min(order_first, order_second)
df['od_grp'] = df['Order_ID'].map(od_map).fillna(df['Order_ID'])
df_grp = df.groupby('od_grp').agg({'statr_date': 'min', 'end_date': 'max'})
print(df_grp)
# statr_date end_date
#od_grp
#1.0 08-Oct-2019 0:00:00 17-Dec-2019 0:00:00
#2.0 10-Oct-2019 0:00:00 02-Sep-2020 0:00:00
#3.0 10-Oct-2019 0:00:00 11-Sep-2020 0:00:00
#4.0 10-Oct-2019 0:00:00 08-Jan-2021 0:00:00
CodePudding user response:
You could also try with for-loops and df.loc[]
for start in df.statr_date:
for end in df.end_date:
if start==end:
df.loc[df[df.end_date.eq(end)].index[0], 'end_date'] = df.loc[df[df.statr_date.eq(start)].index[0], 'end_date']
df = df.drop(df[df.statr_date.eq(start)].index[0])
Output:
ID Order_ID statr_date end_date Product Sub_Product
0 746 1 08-Oct-2019 0:00:00 17-Dec-2019 0:00:00 LPP Abc
1 746 2 10-Oct-2019 0:00:00 02-Sep-2020 0:00:00 LPP Abc
2 746 3 10-Oct-2019 0:00:00 11-Sep-2020 0:00:00 LPP Abc
3 746 4 10-Oct-2019 0:00:00 08-Jan-2021 0:00:00 LPP Abc