I have a dataframe that has a column in datetime type such as:
date id
0 02/04/2015 02:34 1
1 06/04/2015 12:34 2
2 09/04/2015 23:03 3
3 12/04/2015 01:00 4
4 15/04/2015 07:12 5
5 21/04/2015 12:59 6
6 29/04/2015 17:33 7
7 04/05/2015 10:44 8
8 06/05/2015 11:12 9
9 10/05/2015 08:52 10
10 12/05/2015 14:19 11
11 19/05/2015 19:22 12
12 27/05/2015 22:31 13
13 01/06/2015 11:09 14
14 04/06/2015 12:57 15
15 10/06/2015 04:00 16
16 15/06/2015 03:23 17
17 19/06/2015 05:37 18
18 23/06/2015 13:41 19
19 27/06/2015 15:43 20
I would like to change the date column reference such that the first value 02/04/2015 02:34 will be 0, and the second one will be the difference between 06/04/2015 12:34 - 02/04/2015 02:34 = 6360 minutes or 106 Hours.
and so on. How can I change the column in pandas to something similar to the above explanation.
Thanks
CodePudding user response:
You could tackle this problem in 2 steps (and a last optional step):
- Align the values in
"date"
column with the target value (the one in the next row) - Calculate the difference between both dates
- (Optional) Convert the time delta to a relevant time unit (for example, minutes)
So, how could we do this?
For the first step, we'll shift the values using the pandas.Series.shift
method.
df["date2"] = df["date"].shift(1)
This will add a "date2"
column with the values of the "date"
column shifted by adding an empty value at the top. To fill this space, we'll use the first value again (you requested the first value to be zero and we'll use difference next so...)
df.loc[0, "date2"] = df.loc[0, "date"]
For the second step, we'll just take the difference between our columns. Because both "date"
and "date2"
are datetime64[ns]
columns, we are able to perform this operation:
df["diff"] = df["date"] - df["date2"]
This is how your DataFrame
first 5 rows looks now:
date id date2 diff
0 2015-04-02 02:34:00 1 2015-04-02 02:34:00 0 days 00:00:00
1 2015-04-06 12:34:00 2 2015-04-02 02:34:00 4 days 10:00:00
2 2015-04-09 23:03:00 3 2015-04-06 12:34:00 3 days 10:29:00
3 2015-04-12 01:00:00 4 2015-04-09 23:03:00 2 days 01:57:00
4 2015-04-15 07:12:00 5 2015-04-12 01:00:00 3 days 06:12:00
Please consider that the "diff"
column has timedelta64[ns]
type. If you ant to transform this unit to minutes...
df["diff"].dt.total_seconds().div(60).astype(int)
Same first 5 rows of your DataFrame
with the last optional step:
date id date2 diff
0 2015-04-02 02:34:00 1 2015-04-02 02:34:00 0
1 2015-04-06 12:34:00 2 2015-04-02 02:34:00 6360
2 2015-04-09 23:03:00 3 2015-04-06 12:34:00 4949
3 2015-04-12 01:00:00 4 2015-04-09 23:03:00 2997
4 2015-04-15 07:12:00 5 2015-04-12 01:00:00 4692
CodePudding user response:
here is my solution:
import pandas as pd
df = pd.DataFrame(date, columns=['date'])
df['sec'] = pd.to_datetime(df['date']).astype('int64') // 10**9
df['diff_in_sec'] = df['sec'] - df['sec'].iloc[0]
first, convert the date column into the DateTime column then convert it to UNIX timestamp and then calculate the difference in seconds. if you want it as minutes or hours just divide diff_in_sec by 60 or 3600 like this:
df['diff_in_hour'] = df['diff_in_sec'].div(3600)
CodePudding user response:
Convert date
column to datetime64
and use diff
. After that, use dt
accessor to floor
your date by hour and get total_seconds
and divide by 60 or 3600:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df['hours'] = df['date'].diff().dt.floor('H').dt.total_seconds().div(3600).fillna(0).astype(int)
# OR
df['minutes'] = df['date'].diff().dt.floor('H').dt.total_seconds().div(60).fillna(0).astype(int)
Output:
>>> df
date id hours minutes
0 2015-04-02 02:34:00 1 0 0
1 2015-04-06 12:34:00 2 106 6360
2 2015-04-09 23:03:00 3 82 4920
3 2015-04-12 01:00:00 4 49 2940
4 2015-04-15 07:12:00 5 78 4680
5 2015-04-21 12:59:00 6 149 8940
6 2015-04-29 17:33:00 7 196 11760
7 2015-05-04 10:44:00 8 113 6780
8 2015-05-06 11:12:00 9 48 2880
9 2015-05-10 08:52:00 10 93 5580
10 2015-05-12 14:19:00 11 53 3180
11 2015-05-19 19:22:00 12 173 10380
12 2015-05-27 22:31:00 13 195 11700
13 2015-06-01 11:09:00 14 108 6480
14 2015-06-04 12:57:00 15 73 4380
15 2015-06-10 04:00:00 16 135 8100
16 2015-06-15 03:23:00 17 119 7140
17 2015-06-19 05:37:00 18 98 5880
18 2015-06-23 13:41:00 19 104 6240
19 2015-06-27 15:43:00 20 98 5880