How to calculate time difference between two dates in pandas Dataframe-CodePudding

I have a dataframe which is having multiple rows with column date. date column is having date and time. not each row has incremental time so I want to calculate after each row how much was the time difference between current and previous date in seconds.

import pandas as pd
data = pd.date_range('1/1/2011', periods = 10, freq ='H')

In the above snippet time difference after each step is 1hr which means 3600 seconds so I want a list of tuple having [(<prev date time>, <current_datetime>, <time_difference>),.....].

CodePudding user response：

I want a list of tuple having [(prev date time, current_datetime, time_difference),.....]

In this case, use tolal_seconds with zip :

data = pd.date_range("1/1/2011", periods = 10, freq ="H")

L = list(zip(data.shift(), # <- previous time
             data,         # <- current time
            (data.shift() - data).total_seconds())) # <- time diff

NB : If you manipulate a dataframe, you need to replace data by df["date_column"].

Output :

print(L)

[(Timestamp('2011-01-01 01:00:00', freq='H'),
  Timestamp('2011-01-01 00:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 02:00:00', freq='H'),
  Timestamp('2011-01-01 01:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 03:00:00', freq='H'),
  Timestamp('2011-01-01 02:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 04:00:00', freq='H'),
  Timestamp('2011-01-01 03:00:00', freq='H'),
  3600.0),
 (Timestamp('2011-01-01 05:00:00', freq='H'),
  Timestamp('2011-01-01 04:00:00', freq='H'),
  3600.0),
  ...

CodePudding user response：

You can achieve this by using diff function in Pandas to calculate the time difference between consecutive rows in the data column. Here's an example:

df = pd.DataFrame({"date": pd.date_range("1/1/2011", periods=10, freq="H")})

# Calculate the time difference between consecutive rows in seconds
df["time_diff"] = df["date"].diff().dt.total_seconds()

# Create a list of tuples
result = [(df.iloc[i-1]["date"], row["date"], row["time_diff"]) for i, row in df.iterrows()]

df:

    date                time_diff
0   2011-01-01 00:00:00       NaN
1   2011-01-01 01:00:00    3600.0
2   2011-01-01 02:00:00    3600.0
3   2011-01-01 03:00:00    3600.0
4   2011-01-01 04:00:00    3600.0
5   2011-01-01 05:00:00    3600.0
6   2011-01-01 06:00:00    3600.0
7   2011-01-01 07:00:00    3600.0
8   2011-01-01 08:00:00    3600.0
9   2011-01-01 09:00:00    3600.0

result:

[(Timestamp('2011-01-01 09:00:00'), Timestamp('2011-01-01 00:00:00'), NaT),
 (Timestamp('2011-01-01 00:00:00'), Timestamp('2011-01-01 01:00:00'), 3600.0),
 (Timestamp('2011-01-01 01:00:00'), Timestamp('2011-01-01 02:00:00'), 3600.0),
 (Timestamp('2011-01-01 02:00:00'), Timestamp('2011-01-01 03:00:00'), 3600.0),
 (Timestamp('2011-01-01 03:00:00'), Timestamp('2011-01-01 04:00:00'), 3600.0),
 (Timestamp('2011-01-01 04:00:00'), Timestamp('2011-01-01 05:00:00'), 3600.0),
 (Timestamp('2011-01-01 05:00:00'), Timestamp('2011-01-01 06:00:00'), 3600.0),
 (Timestamp('2011-01-01 06:00:00'), Timestamp('2011-01-01 07:00:00'), 3600.0),
 (Timestamp('2011-01-01 07:00:00'), Timestamp('2011-01-01 08:00:00'), 3600.0),
 (Timestamp('2011-01-01 08:00:00'), Timestamp('2011-01-01 09:00:00'), 3600.0)]