I want to know how to get the union of serveral time intervals for each id and name?
import pandas as pd
id = [1,1,1,1,1,1,1,2,2,2]
name = ['A','A','A','A','A','B','B','C','C','C']
Start_time = ['2005-06-27','2005-07-07','2005-07-12','2006-11-15','2008-08-22','2009-03-03','2009-03-06','2007-10-26','2007-10-31','2007-11-06']
Final_time = ['2005-07-07','2005-07-12','2005-09-26','2008-08-22','2009-02-24','2009-03-06','2009-03-12','2007-10-31','2007-11-05','2007-11-09']
dataframe = pd.DataFrame({'id':id,'name':name,'Start_time':Start_time,'Final_time':Final_time})
dataframe['Start_time'] = pd.to_datetime(dataframe['Start_time'])
dataframe['Final_time'] = pd.to_datetime(dataframe['Final_time'])
If the time intervals can be merged for each id and name, then the related result_S and result_F should be the same,just like the image shows.
CodePudding user response:
You can accomplish that with combination of mask, backward fill and forward fill methods in pandas. Example code below:
df_s = dataframe[1:]
df_f = dataframe[:-1]
conds = (df_s.Start_time.values == df_f.Final_time.values) & (df_s.name.values == df_f.name.values) & (df_s.id.values == df_f.id.values)
conds_s = [False] list(conds)
conds_f = list(conds) [False]
dataframe['Result_S'] = dataframe['Start_time'].mask(conds_s).ffill()
dataframe['Result_F'] = dataframe['Final_time'].mask(conds_f).bfill()
Note that in the solution above we assume that dataframe
is already sorted.