How to remove datetime which starts before endtime of the previous datetime-CodePudding

I am going to make event time list where duplicate date time and starting time of a next event is starts before ending time of a previous event does not exist in the list. I converted date time to Julian date. And the list is as below:

Before converting to Julian date:

   timename      starttime                endtime
1  20170103      2017/01/03T05:36:05      2017/01/03T13:42:04
2  20170103      2017/01/03T14:00:04      2017/01/03T20:30:07
3  20170112      2017/01/12T16:24:05      2017/01/12T23:42:05
4  20170114      2017/01/14T05:00:05      2017/01/14T09:42:06
5  20170114      2017/01/14T08:24:05      2017/01/14T14:18:05
6  20170117      2017/01/17T21:49:12      2017/01/18T02:30:06
7  20170119      2017/01/19T12:24:05      2017/01/19T18:30:06
8  20170124      2017/01/24T15:36:05      2017/01/25T04:42:07
9  20170125      2017/01/25T16:00:05      2017/01/26T14:30:05
...

After converting to Julian date:

  timename   starttime           endtime
1 2457756.5  2457756.733391204   2457757.07087963
2 2457756.5  2457757.0833796295  2457757.354247685
3 2457765.5  2457766.1833912036  2457766.4875578703
4 2457767.5  2457767.7083912035  2457767.9042361113
5 2457767.5  2457767.85005787    2457768.0958912037
6 2457770.5  2457771.4091666667  2457771.604236111
7 2457772.5  2457773.016724537   2457773.2709027776
8 2457777.5  2457778.1500578704  2457778.695914352
9 2457778.5  2457779.166724537   2457780.104224537
...

I want to remove if first row of strattime and endtime includes second row start time. Otherwise, skip. For example, you can see that the 5th row starttime (2017/01/14T08:24:05) is starts before the 4th row endtime (2017/01/14T09:42:06). So it includes 5th row starttime. But I need to remove them and get only times which starts after previous endtime.

Here is my attempt.

import itertools
import pandas as pd

infilename2 = 'D:/My works/C3_jdate/2017.txt'
outfilename2 = 'D:/My works/2017_1.txt'


lines_seen = set()
outfile = open(outfilename2, "w")
df = pd.read_csv(infilename2,sep = '\t', engine='python', header=0)
    
t = df['timename']
t1 = df['timename']
st = df['starttime']
et = df['endtime']

for i in range(len(df)):
    for j in range(i 1, len(df)):
        if t[i] != t1[j]:
            pass
        elif t[i] == t1[j]:
            if (st[i]<st[j] and et[i]<=st[j]):
                
                line1 = df.to_string()
                if line1 not in lines_seen:
                    outfile.write(line1)
                    lines_seen.add(line1)
outfile.close()

But still, output is the same with input. Please share any idea for this problem.

CodePudding user response：

Perhaps this will work:

# Get time between current start and last end
time_from_last_endtime = df["starttime"] - df["endtime"].shift(fill_value=0)
df[time_from_last_endtime >= 0]

CodePudding user response：

For each row, you could subtract the current previous endtime from the current starttime, using Pandas:

df['diff'] = df['starttime'] - df['endtime'].shift(1)

Here, .shift moves the rows up by the specified period, in this case, 1.

Do make sure that your data is sorted accordingly.

Then, you can filter out the rows where diff is negative (keep the rows where time difference between current starttime and previous endtime is > 0:

filtered_df = df[df['diff'] > pd.Timedelta('0 days 00:00:00')]