Generating rows (mins) based on difference between start and end time-CodePudding

This is a real use case that I am trying to implement in my work.

Sample data (fake data but similar data structure)

Lap     Starttime     Endtime
1       10:00:00     10:05:00

format: hh:mm:ss

Desired output

Lap     time
1       10:00:00
1       10:01:00
1       10:02:00
1       10:03:00
1       10:04:00
1       10:05:00

so far only trying to think of the logic and techniques required... the codes are not correct

import re
import pandas as pd

df = pd.read_csv('sample.csv')


#1.  to determine how many rows to generate. eg. 1000 to 1005 is 6 rows
df['time'] = df['Endtime'] - df['Startime']

#2.  add one new row with 1 added minute. eg. 6 rows
for i in No_of_rows:
    if df['time'] < df['Endtime']: #if 'time' still before end time, then continue append
          df['time'] = df['Startime']  = 1 #not sure how to select Minute part only 
    else:
          continue

pardon my limited coding skills. appreciate all the help from you experts.. thanks!

CodePudding user response：

Try with pd.date_range and explode:

#convert to datetime if needed
df["Starttime"] = pd.to_datetime(df["Starttime"], format="%H:%M:%S")
df["Endtime"] = pd.to_datetime(df["Endtime"], format="%H:%M:%S")

#create list of 1min ranges
df["Range"] = df.apply(lambda x: pd.date_range(x["Starttime"], x["Endtime"], freq="1min"), axis=1)

#explode, drop unneeded columns and keep only time
df = df.drop(["Starttime", "Endtime"], axis=1).explode("Range")
df["Range"] = df["Range"].dt.time

>>> df
        Range
Lap          
1    10:00:00
1    10:01:00
1    10:02:00
1    10:03:00
1    10:04:00
1    10:05:00

Input df:

df = pd.DataFrame({"Lap": [1],
                   "Starttime": ["10:00:00"],
                   "Endtime": ["10:05:00"]}).set_index("Lap")

>>> df
    Starttime   Endtime
Lap                    
1    10:00:00  10:05:00

CodePudding user response：

You can convert the times to datetimes, that will arbitrarily prepend the date of today (at whatever date you’re running) but we can then remove that later and it allows for easier manupulation:

>>> bounds = df[['Starttime', 'Endtime']].transform(pd.to_datetime)
>>> bounds
            Starttime             Endtime
0 2021-09-29 10:00:00 2021-09-29 10:05:00
1 2021-09-29 10:00:00 2021-09-29 10:02:00

Then we can simply use pd.date_range with a 1 minute frequency:

>>> times = bounds.agg(lambda s: pd.date_range(*s, freq='1min'), axis='columns')
>>> times
0    DatetimeIndex(['2021-09-29 10:00:00', '2021-09...
1    DatetimeIndex(['2021-09-29 10:00:00', '2021-09...
dtype: object

Now joining that with the Lap info and using df.explode():

>>> result = df[['Lap']].join(times.rename('time')).explode('time').reset_index(drop=True)
>>> result
   Lap                time
0    1 2021-09-29 10:00:00
1    1 2021-09-29 10:01:00
2    1 2021-09-29 10:02:00
3    1 2021-09-29 10:03:00
4    1 2021-09-29 10:04:00
5    1 2021-09-29 10:05:00
6    2 2021-09-29 10:00:00
7    2 2021-09-29 10:01:00
8    2 2021-09-29 10:02:00

Finally we wanted to remove the day:

>>> result['time'] = result['time'].dt.time
>>> result
   Lap      time
0    1  10:00:00
1    1  10:01:00
2    1  10:02:00
3    1  10:03:00
4    1  10:04:00
5    1  10:05:00
6    2  10:00:00
7    2  10:01:00
8    2  10:02:00

The objects in your series are now datetime.time

CodePudding user response：

Here is another way without using apply/agg:

Convert to datetime first:

df["Starttime"] = pd.to_datetime(df["Starttime"], format="%H:%M:%S")
df["Endtime"] = pd.to_datetime(df["Endtime"], format="%H:%M:%S")

Get difference between the end and start times and then using index.repeat, repeat the rows. Then using groupby & cumcount, get pd.to_timedelta in minutes and add to the existing start time:

repeats = df['Endtime'].sub(df['Starttime']).dt.total_seconds()//60
out = df.loc[df.index.repeat(repeats 1),['Lap','Starttime']]
out['Starttime'] = (out['Starttime'].add(
                    pd.to_timedelta(out.groupby("Lap").cumcount(),'min')).dt.time)

print(out)

   Lap Starttime
0    1  10:00:00
0    1  10:01:00
0    1  10:02:00
0    1  10:03:00
0    1  10:04:00
0    1  10:05:00