I have a time series data, with a unix seconds column (among other columns):
df = pd.DataFrame(
{
'user': [3,3,3,3,3,6,6,6],
'timestamp': [1459467971, 1459468020, 1459468026, 1459468031,
1459468036,1513974852, 1513974853, 1513974854]
}
)
The dataset is for classification of some events, some are frequent in the morning, others in the afternoon.
I think time-of-day
could be an important discriminant as well in this task.
How do I extract number of seconds for time of the day from this?
Note:
I know that using pandas to_datetime
object would give HH:MM:SS
as:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
#df['date'] = df['timestamp'].dt.date
df['time'] = df['timestamp'].dt.time
df
user timestamp time
0 3 2016-03-31 23:46:11 23:46:11
1 3 2016-03-31 23:47:00 23:47:00
2 3 2016-03-31 23:47:06 23:47:06
3 3 2016-03-31 23:47:11 23:47:11
4 3 2016-03-31 23:47:16 23:47:16
5 6 2017-12-22 20:34:12 20:34:12
6 6 2017-12-22 20:34:13 20:34:13
7 6 2017-12-22 20:34:14 20:34:14
Isn't there an easy way to pick out the number of seconds representing the time, like this:
user timestamp time-of-day
0 3 2016-03-31 23:46:11 85571
1 3 2016-03-31 23:47:00 85620
2 3 2016-03-31 23:47:06 85626
3 3 2016-03-31 23:47:11 85631
4 3 2016-03-31 23:47:16 85636
5 6 2017-12-22 20:34:12 74052
6 6 2017-12-22 20:34:13 74053
7 6 2017-12-22 20:34:14 74054
CodePudding user response:
You can use dt.normalize
to subtract the date part:
df['time-of-day'] = (df['timestamp'].sub(df['timestamp'].dt.normalize())
.dt.total_seconds().astype(int))
print(df)
# Output
user timestamp date time time-of-day
0 3 2016-03-31 23:46:11 2016-03-31 23:46:11 85571
1 3 2016-03-31 23:47:00 2016-03-31 23:47:00 85620
2 3 2016-03-31 23:47:06 2016-03-31 23:47:06 85626
3 3 2016-03-31 23:47:11 2016-03-31 23:47:11 85631
4 3 2016-03-31 23:47:16 2016-03-31 23:47:16 85636
5 6 2017-12-22 20:34:12 2017-12-22 20:34:12 74052
6 6 2017-12-22 20:34:13 2017-12-22 20:34:13 74053
7 6 2017-12-22 20:34:14 2017-12-22 20:34:14 74054
Note: I'm not sure to use time-of-day
is relevant as continuous variable. Maybe you should think to discretize into 8 blocks of 3 hours?
CodePudding user response:
You can do it using hour
, minute
and second
and some basic algebra
df['time-of-day'] = df['timestamp'].dt.hour * 3600 df['timestamp'].dt.minute * 60 df['timestamp'].dt.second
CodePudding user response:
you can also use this:
df['time-of-day'] = df['timestamp'].apply(lambda x: x.second x.minute * 60 x.hour * 3600)