I have a logged dataset that is in bursts. It is set to 16Hz, so records 960 records a minute, then pauses for 300 seconds, then bursts again. So in my dataframe it is recorded as:
1 1 data data data
1 2 data data data
1 3 data data data
~~
1 960 data data data
2 1 data data data
2 2 data data data
And so on...
If I know the start time and date, can I add a new column and calculate a time/date for each event recorded in Pandas? Hope that all makes sense...
Successfully read file in Pandas using:
dataframe = pd.read_csv('filename.dat', sep='\s ', header=None)
CodePudding user response:
Sample
import pandas as pd
import numpy as np
df = pd.DataFrame({
'cycle': [i for l in [[i for _ in range(0,960)] for i in range(1,21)] for i in l],
'record': [i for i in range(1,961)]*20
})
print(df)
Output:
cycle record
0 1 1
1 1 2
2 1 3
3 1 4
4 1 5
... ... ...
19195 20 956
19196 20 957
19197 20 958
19198 20 959
19199 20 960
[19200 rows x 2 columns]
Solution
You can use pd.date_range
and pd.Timedelta
to achieve your goals:
startdate = pd.to_datetime('2020-12-13 23:30:02')
period = lambda x: pd.date_range(x, x pd.Timedelta(1,'min'), periods=960)
dr = period(startdate)
while(len(dr) < len(df)):
followdate = dr[-1] pd.Timedelta(300,'sec')
dr = dr.append(period(followdate))
df['time'] = dr[0:len(df)]
print(df)
Output:
cycle record time
0 1 1 2020-12-13 23:30:00.000000000
1 1 2 2020-12-13 23:30:00.062565172
2 1 3 2020-12-13 23:30:00.125130344
3 1 4 2020-12-13 23:30:00.187695516
4 1 5 2020-12-13 23:30:00.250260688
... ... ... ...
19195 20 956 2020-12-14 01:24:59.749739311
19196 20 957 2020-12-14 01:24:59.812304483
19197 20 958 2020-12-14 01:24:59.874869655
19198 20 959 2020-12-14 01:24:59.937434827
19199 20 960 2020-12-14 01:25:00.000000000
[19200 rows x 3 columns]