pandas calculate time difference per several levels in one column df-CodePudding

I have a following dataset:

I would like to get a result as follows:

The goal is to calculate duration per "Level" column.

Dataset:

import pandas as pd
from datetime import datetime, date

data = {'Time': ["08:35:00", "08:40:00", "08:45:00", "08:55:00", "08:57:00", "08:59:00"],
        'Level': [250, 250, 250, 200, 200, 200]}

df = pd.DataFrame(data)
df['Time'] = pd.to_datetime(df['Time'],format= '%H:%M:%S' ).dt.time

Difference between two datetimes i am able to calculate with the code:

t1 = df['Time'].iloc[0]
t2 = df['Time'].iloc[1]

c = datetime.combine(date.today(), t2) - datetime.combine(date.today(), t1)

But i am not able to "automate" the calculation. This code works the only for integers.

df2 = df.groupby('Level').apply(lambda x: x.Time.max() - x.Time.min())

CodePudding user response：

If you keep the date part of Time, the calculation is a lot easier:

df = pd.DataFrame(data)

# Keep the date part, even though it's meaningless
df["Time"] = pd.to_datetime(df["Time"], format="%H:%M:%S")


def to_string(duration: pd.Timedelta) -> str:
    total = duration.total_seconds()
    hours, remainder = divmod(total, 3600)
    minutes, seconds = divmod(remainder, 60)
    return f"{hours:02.0f}:{minutes:02.0f}:{seconds:02.0f}"


level = df["Level"]

# CAUTION: avoid calling to_string until the very last step,
# when you need to display your result. There's not many
# calculations you can do with strings.
df["Time"].groupby(level).diff().groupby(level).sum().apply(to_string)