I have a time series dataframe
where the logs of a particular may runs from several minutes to hours. For visualisation purpose, I have to slice the logs of each id to the first 15-minutes data.
Take the example of the df below:
df = pd.DataFrame(
{'id': [15,15,15,15,15,64,64,64,64,64],
'timestamp': ['2016-04-01 00:05:00','2016-04-01 00:10:20','2016-04-01 00:13:01',
'2016-04-01 00:14:00','2016-04-01 00:16:00','2016-04-01 21:04:59',
'2016-04-01 21:13:05','2016-04-01 21:20:00','2016-04-01 21:25:25',
'2016-04-01 21:59:59']}
)
df
id timestamp
0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
4 15 2016-04-01 00:16:00
5 64 2016-04-01 21:04:59
6 64 2016-04-01 21:13:05
7 64 2016-04-01 21:20:00
8 64 2016-04-01 21:25:25
9 64 2016-04-01 21:59:59
By limiting the data of each id
to 15-minutes from the start, I want to have the following new_df
:
new-df
id timestamp
0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
4 64 2016-04-01 21:04:59
5 64 2016-04-01 21:13:05
How do I achieve this?
CodePudding user response:
Your new_df
doesn't match the output that you would get according to your description.
But you can use below and verify your desired output:
df['timestamp'] = pd.to_datetime(df['timestamp'])
d = df.groupby('id').transform(lambda x: (x-x.min()))
d['timestamp'] = d['timestamp'].apply(pd.Timedelta.total_seconds).div(60)
new_df = df[d['timestamp'] <= 15]
print(new_df):
id timestamp
0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
4 15 2016-04-01 00:16:00
5 64 2016-04-01 21:04:59
6 64 2016-04-01 21:13:05
CodePudding user response:
Assuming that the column timestamp
is of datetime (using pandas.to_datetime
)
df['timestamp'] = pd.to_datetime(df['timestamp'])
Considering that OP wants the timer to start at the beginning of the first timestamp
for a given id
, the following will do the work
new_df = df.groupby('id').apply(lambda x: x[x.timestamp <= x.timestamp.iloc[0] pd.Timedelta(minutes=15)])
[Out]:
id timestamp
0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
4 15 2016-04-01 00:16:00
5 64 2016-04-01 21:04:59
6 64 2016-04-01 21:05:13
7 64 2016-04-01 21:05:20
8 64 2016-04-01 21:05:25
9 64 2016-04-01 21:04:59
If OP wants the timer to start at 0 hours
, 0 minutes
, and 0 seconds
, then the following will do the work
new_df2 = df.groupby('id').apply(lambda x: x[x.timestamp <= x.timestamp.iloc[0].replace(hour=0, minute=0, second=0) pd.Timedelta(minutes=15)])
[Out]:
id timestamp
id
15 0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
Notes:
In both cases, the
new-df
that OP shares in the question is different.In the first operation,
x.timestamp.iloc[0]
is selecting the firsttimestamp
, and serves the purpose of indicating that the firsttimestamp
will be used to define the start date. In the second operation one is doing the same, but instead of keeping that value, one is replacing it with0 hours
,0 minutes
,0 seconds
..iloc
basically allows one to get rows/columns at integer locations. As we want the first it is.iloc[0]
.One is using
pandas.Timedelta
to represent the difference between the two dates. In this case it is15 minutes
.