Home > front end >  How to transform these columns in time
How to transform these columns in time

Time:09-17

I am doind some Data Exploratory Analysis to a dataset with columns with date with this format:

0       2020-02-25 00:29:00
1       2020-02-24 23:50:00
2       2020-02-25 00:31:00
3       2020-02-25 02:03:00
4       2020-02-25 03:51:00

Doing the substraction columnc = column A -column B, I get:

0              0 days 00:51:00
1       0 days 01:05:12.207000
2       0 days 00:50:41.817000
3              0 days 00:23:00

I'd like to transform this column C in minutes, for example but I don't find the way to declare the days in the conversion. I found this topic: How to convert datetime to integer in python but it doesn't include days and I don't have same separators, Is it possible or do I have to change the column A and B somehow?

Thank as usual,

This community is very useful for people who are getting into the topic.

CodePudding user response:

You could use dt.total_seconds() / 60:

>>> df.dt.total_seconds() / 60
0    51.00000
1    65.20345
2    50.69695
3    23.00000
dtype: float64
>>>

Simply get the total seconds and divide it by 60 to get the minutes.

If you don't want the extra decimal points, try:

>>> df.dt.total_seconds().astype(int) // 60
0    51
1    65
2    50
3    23
dtype: int32
>>> 

CodePudding user response:

While I am unsure of your exact date-formats (is it a datetime object? is it just a string?), assuming it is a string here, initially you want to convert it to a datetime object:

datetime_object = datetime.strptime('2020-02-25 00:29:00', '%Y-%m-%d %H:%M:%S')

After converting, (or if you already have a datetime object), you can subtract them:

timedelta_object = datetime_object2-datetime_object1

Finally, you can convert the timedelta into minute:

diff_minutes = timedelta_object.total_seconds()/60

Edit: Just before submitting I just saw that U12-Forward has already posted an answer that contains the same information here mostly, but I will keep my answer as well as it contains a bit more details.

CodePudding user response:

assuming your dataframe looks like this (i've just subtracted 4 days from Col A)

             Column A            Column B
0 2020-02-25 00:29:00 2020-02-21 00:29:00
1 2020-02-24 23:50:00 2020-02-20 23:50:00
2 2020-02-25 00:31:00 2020-02-21 00:31:00
3 2020-02-25 02:03:00 2020-02-21 02:03:00
4 2020-02-25 03:51:00 2020-02-21 03:51:00

we can use numpy to get your timedelta as an integer.

using np.timedelta64 and passing the arguments 1 m where m = minute

import pandas as pd
import numpy as np

df['minute_delta'] = (df['Column A'] - df['Column B']) / np.timedelta64(1,'m')

             Column A            Column B  minute_delta
0 2020-02-25 00:29:00 2020-02-21 00:29:00        5760.0
1 2020-02-24 23:50:00 2020-02-20 23:50:00        5760.0
2 2020-02-25 00:31:00 2020-02-21 00:31:00        5760.0
3 2020-02-25 02:03:00 2020-02-21 02:03:00        5760.0
4 2020-02-25 03:51:00 2020-02-21 03:51:00        5760.0
  • Related