I am doind some Data Exploratory Analysis to a dataset with columns with date with this format:
0 2020-02-25 00:29:00
1 2020-02-24 23:50:00
2 2020-02-25 00:31:00
3 2020-02-25 02:03:00
4 2020-02-25 03:51:00
Doing the substraction columnc = column A -column B, I get:
0 0 days 00:51:00
1 0 days 01:05:12.207000
2 0 days 00:50:41.817000
3 0 days 00:23:00
I'd like to transform this column C in minutes, for example but I don't find the way to declare the days in the conversion. I found this topic: How to convert datetime to integer in python but it doesn't include days and I don't have same separators, Is it possible or do I have to change the column A and B somehow?
Thank as usual,
This community is very useful for people who are getting into the topic.
CodePudding user response:
You could use dt.total_seconds() / 60
:
>>> df.dt.total_seconds() / 60
0 51.00000
1 65.20345
2 50.69695
3 23.00000
dtype: float64
>>>
Simply get the total seconds and divide it by 60 to get the minutes.
If you don't want the extra decimal points, try:
>>> df.dt.total_seconds().astype(int) // 60
0 51
1 65
2 50
3 23
dtype: int32
>>>
CodePudding user response:
While I am unsure of your exact date-formats (is it a datetime object? is it just a string?), assuming it is a string here, initially you want to convert it to a datetime object:
datetime_object = datetime.strptime('2020-02-25 00:29:00', '%Y-%m-%d %H:%M:%S')
After converting, (or if you already have a datetime object), you can subtract them:
timedelta_object = datetime_object2-datetime_object1
Finally, you can convert the timedelta into minute:
diff_minutes = timedelta_object.total_seconds()/60
Edit: Just before submitting I just saw that U12-Forward has already posted an answer that contains the same information here mostly, but I will keep my answer as well as it contains a bit more details.
CodePudding user response:
assuming your dataframe looks like this (i've just subtracted 4 days from Col A)
Column A Column B
0 2020-02-25 00:29:00 2020-02-21 00:29:00
1 2020-02-24 23:50:00 2020-02-20 23:50:00
2 2020-02-25 00:31:00 2020-02-21 00:31:00
3 2020-02-25 02:03:00 2020-02-21 02:03:00
4 2020-02-25 03:51:00 2020-02-21 03:51:00
we can use numpy
to get your timedelta as an integer.
using np.timedelta64
and passing the arguments 1
m
where m
= minute
import pandas as pd
import numpy as np
df['minute_delta'] = (df['Column A'] - df['Column B']) / np.timedelta64(1,'m')
Column A Column B minute_delta
0 2020-02-25 00:29:00 2020-02-21 00:29:00 5760.0
1 2020-02-24 23:50:00 2020-02-20 23:50:00 5760.0
2 2020-02-25 00:31:00 2020-02-21 00:31:00 5760.0
3 2020-02-25 02:03:00 2020-02-21 02:03:00 5760.0
4 2020-02-25 03:51:00 2020-02-21 03:51:00 5760.0