I have a Pandas DataFrame consists of several columns which has information about sequence number based on system_name:
|id|time |system_name|status |sequence
--- ------------------- ----------- ---------- --------
001|2022-06-01 10:00:12|A |RUNNING |1
002|2022-06-01 10:01:02|B |RUNNING |1
003|2022-06-01 10:01:45|A |DONE |2
004|2022-06-01 10:01:52|B |DONE |2
005|2022-06-01 10:02:00|C |RUNNING |1
006|2022-06-01 10:02:02|B |ERROR |3
007|2022-06-01 10:02:13|A |IDLE |3
008|2022-06-01 10:01:02|C |DONE |2
However I want to join condition based on previous sequence like below
id|time |system_name|status |sequence |previous_stat|last_time |
--- ------------------- ----------- ---------- --------- ------------- -------------------
003|2022-06-01 10:01:45|A |DONE |2 |RUNNING |2022-06-01 10:00:12|
004|2022-06-01 10:01:52|B |DONE |2 |RUNNING |2022-06-01 10:01:02|
005|2022-06-01 10:02:00|C |RUNNING |1 |nan |nta |
006|2022-06-01 10:02:02|B |ERROR |3 |DONE |2022-06-01 10:01:52|
007|2022-06-01 10:02:13|A |IDLE |3 |DONE |2022-06-01 10:01:45|
008|2022-06-01 10:01:02|C |DONE |2 |RUNNING |2022-06-01 10:02:00|
I have tried to using .loc
function like below:
df['previous_stat'] = df['status'].loc[(df['sequence']==df['sequence'].values-1)&(df['system_name']=df['system_name'])]
But couldn't get the right previous stat and last time. any helps are appreciated.
CodePudding user response:
You can do groupby
with shift
then join
back and dropna
out = df.join(df.groupby('system_name').shift()[['status','time']].add_prefix('pervious_')).dropna()