I am going to make event time list where duplicate date time and starting time of a next event is starts before ending time of a previous event does not exist in the list. I converted date time to Julian date. And the list is as below:
Before converting to Julian date:
timename starttime endtime
1 20170103 2017/01/03T05:36:05 2017/01/03T13:42:04
2 20170103 2017/01/03T14:00:04 2017/01/03T20:30:07
3 20170112 2017/01/12T16:24:05 2017/01/12T23:42:05
4 20170114 2017/01/14T05:00:05 2017/01/14T09:42:06
5 20170114 2017/01/14T08:24:05 2017/01/14T14:18:05
6 20170117 2017/01/17T21:49:12 2017/01/18T02:30:06
7 20170119 2017/01/19T12:24:05 2017/01/19T18:30:06
8 20170124 2017/01/24T15:36:05 2017/01/25T04:42:07
9 20170125 2017/01/25T16:00:05 2017/01/26T14:30:05
...
After converting to Julian date:
timename starttime endtime
1 2457756.5 2457756.733391204 2457757.07087963
2 2457756.5 2457757.0833796295 2457757.354247685
3 2457765.5 2457766.1833912036 2457766.4875578703
4 2457767.5 2457767.7083912035 2457767.9042361113
5 2457767.5 2457767.85005787 2457768.0958912037
6 2457770.5 2457771.4091666667 2457771.604236111
7 2457772.5 2457773.016724537 2457773.2709027776
8 2457777.5 2457778.1500578704 2457778.695914352
9 2457778.5 2457779.166724537 2457780.104224537
...
I want to remove if first row of strattime and endtime includes second row start time. Otherwise, skip. For example, you can see that the 5th row starttime (2017/01/14T08:24:05) is starts before the 4th row endtime (2017/01/14T09:42:06). So it includes 5th row starttime. But I need to remove them and get only times which starts after previous endtime.
Here is my attempt.
import itertools
import pandas as pd
infilename2 = 'D:/My works/C3_jdate/2017.txt'
outfilename2 = 'D:/My works/2017_1.txt'
lines_seen = set()
outfile = open(outfilename2, "w")
df = pd.read_csv(infilename2,sep = '\t', engine='python', header=0)
t = df['timename']
t1 = df['timename']
st = df['starttime']
et = df['endtime']
for i in range(len(df)):
for j in range(i 1, len(df)):
if t[i] != t1[j]:
pass
elif t[i] == t1[j]:
if (st[i]<st[j] and et[i]<=st[j]):
line1 = df.to_string()
if line1 not in lines_seen:
outfile.write(line1)
lines_seen.add(line1)
outfile.close()
But still, output is the same with input. Please share any idea for this problem.
CodePudding user response:
Perhaps this will work:
# Get time between current start and last end
time_from_last_endtime = df["starttime"] - df["endtime"].shift(fill_value=0)
df[time_from_last_endtime >= 0]
CodePudding user response:
For each row, you could subtract the current previous endtime
from the current starttime
, using Pandas:
df['diff'] = df['starttime'] - df['endtime'].shift(1)
Here, .shift moves the rows up by the specified period, in this case, 1.
Do make sure that your data is sorted accordingly.
Then, you can filter out the rows where diff
is negative (keep the rows where time difference between current starttime and previous endtime is > 0:
filtered_df = df[df['diff'] > pd.Timedelta('0 days 00:00:00')]