I'm trying to make a new feature out from one date column to provide start and end date.
Here's what it looks like:
unique_id = ["001", "001", "001",
"002",
"003", "003"
]
end_dates = ["2018-10-31 12:43:03 PM", "2018-10-31 12:44:23 PM", "2018-10-31 1:01:42 PM",
"2018-11-23 03:33:13 PM",
"2018-11-23 04:10:45 PM", "2018-11-23 04:13:58 PM"
]
activity_class = ["step 1", "step 2", "step 3",
"step 1",
"step 1", "step 2"
]
df = \
pd.DataFrame({"ID": unique_id,
"Edit Date": end_dates,
"Activity": activity_class
})
df["Edit Date"] = pd.to_datetime(df["Edit Date"])
Here's how I want it to look like:
unique_id = ["001", "001", "001",
"002",
"003", "003"
]
start_date = ["2018-10-31 12:43:03 PM", "2018-10-31 12:43:03 PM", "2018-10-31 12:44:23 PM",
"2018-11-23 03:33:13 PM",
"2018-11-23 04:10:45 PM", "2018-11-23 04:10:45 PM"
]
end_date = ["2018-10-31 12:43:03 PM", "2018-10-31 12:44:23 PM", "2018-10-31 1:01:42 PM",
"2018-11-23 03:33:13 PM",
"2018-11-23 04:10:45 PM", "2018-11-23 04:13:58 PM"
]
activity_class = ["step 1", "step 2", "step 3",
"step 1",
"step 1", "step 2"
]
df = \
pd.DataFrame({"ID": unique_id,
"Start_Date": start_date,
"End_Date": end_date,
"Activity": activity_class
})
df["Start_Date"] = pd.to_datetime(df["Start_Date"])
df["End_Date"] = pd.to_datetime(df["End_Date"])
What I tried so far:
df["Start_Date"] = df["Edit Date"].shift(1).backfill()
Some of the rules:
- Data is sorted ascending by unique id and date
- regardless of the label in "activity" as long as it's the first one, it the date should be the same for start and end
- next activity's start date should copy the previous activity's end date
CodePudding user response:
You are looking for groupby().shift()
?
df['Start_Date'] = df['Edit Date']
df['End_Date'] = df.groupby('ID')['Edit Date'].shift().fillna(df['Edit Date'])
Output:
ID Edit Date Activity Start_Date End_Date
0 001 2018-10-31 12:43:03 step 1 2018-10-31 12:43:03 2018-10-31 12:43:03
1 001 2018-10-31 12:44:23 step 2 2018-10-31 12:44:23 2018-10-31 12:43:03
2 001 2018-10-31 13:01:42 step 3 2018-10-31 13:01:42 2018-10-31 12:44:23
3 002 2018-11-23 15:33:13 step 1 2018-11-23 15:33:13 2018-11-23 15:33:13
4 003 2018-11-23 16:10:45 step 1 2018-11-23 16:10:45 2018-11-23 16:10:45
5 003 2018-11-23 16:13:58 step 2 2018-11-23 16:13:58 2018-11-23 16:10:45