This is a real use case that I am trying to implement in my work.
Sample data (fake data but similar data structure)
Lap Starttime Endtime
1 10:00:00 10:05:00
format: hh:mm:ss
Desired output
Lap time
1 10:00:00
1 10:01:00
1 10:02:00
1 10:03:00
1 10:04:00
1 10:05:00
so far only trying to think of the logic and techniques required... the codes are not correct
import re
import pandas as pd
df = pd.read_csv('sample.csv')
#1. to determine how many rows to generate. eg. 1000 to 1005 is 6 rows
df['time'] = df['Endtime'] - df['Startime']
#2. add one new row with 1 added minute. eg. 6 rows
for i in No_of_rows:
if df['time'] < df['Endtime']: #if 'time' still before end time, then continue append
df['time'] = df['Startime'] = 1 #not sure how to select Minute part only
else:
continue
pardon my limited coding skills. appreciate all the help from you experts.. thanks!
CodePudding user response:
Try with pd.date_range
and explode
:
#convert to datetime if needed
df["Starttime"] = pd.to_datetime(df["Starttime"], format="%H:%M:%S")
df["Endtime"] = pd.to_datetime(df["Endtime"], format="%H:%M:%S")
#create list of 1min ranges
df["Range"] = df.apply(lambda x: pd.date_range(x["Starttime"], x["Endtime"], freq="1min"), axis=1)
#explode, drop unneeded columns and keep only time
df = df.drop(["Starttime", "Endtime"], axis=1).explode("Range")
df["Range"] = df["Range"].dt.time
>>> df
Range
Lap
1 10:00:00
1 10:01:00
1 10:02:00
1 10:03:00
1 10:04:00
1 10:05:00
Input df:
df = pd.DataFrame({"Lap": [1],
"Starttime": ["10:00:00"],
"Endtime": ["10:05:00"]}).set_index("Lap")
>>> df
Starttime Endtime
Lap
1 10:00:00 10:05:00
CodePudding user response:
You can convert the times to datetimes, that will arbitrarily prepend the date of today (at whatever date you’re running) but we can then remove that later and it allows for easier manupulation:
>>> bounds = df[['Starttime', 'Endtime']].transform(pd.to_datetime)
>>> bounds
Starttime Endtime
0 2021-09-29 10:00:00 2021-09-29 10:05:00
1 2021-09-29 10:00:00 2021-09-29 10:02:00
Then we can simply use pd.date_range
with a 1 minute frequency:
>>> times = bounds.agg(lambda s: pd.date_range(*s, freq='1min'), axis='columns')
>>> times
0 DatetimeIndex(['2021-09-29 10:00:00', '2021-09...
1 DatetimeIndex(['2021-09-29 10:00:00', '2021-09...
dtype: object
Now joining that with the Lap
info and using df.explode()
:
>>> result = df[['Lap']].join(times.rename('time')).explode('time').reset_index(drop=True)
>>> result
Lap time
0 1 2021-09-29 10:00:00
1 1 2021-09-29 10:01:00
2 1 2021-09-29 10:02:00
3 1 2021-09-29 10:03:00
4 1 2021-09-29 10:04:00
5 1 2021-09-29 10:05:00
6 2 2021-09-29 10:00:00
7 2 2021-09-29 10:01:00
8 2 2021-09-29 10:02:00
Finally we wanted to remove the day:
>>> result['time'] = result['time'].dt.time
>>> result
Lap time
0 1 10:00:00
1 1 10:01:00
2 1 10:02:00
3 1 10:03:00
4 1 10:04:00
5 1 10:05:00
6 2 10:00:00
7 2 10:01:00
8 2 10:02:00
The objects in your series are now datetime.time
CodePudding user response:
Here is another way without using apply/agg:
Convert to datetime first:
df["Starttime"] = pd.to_datetime(df["Starttime"], format="%H:%M:%S")
df["Endtime"] = pd.to_datetime(df["Endtime"], format="%H:%M:%S")
Get difference between the end and start times and then using index.repeat
, repeat the rows. Then using groupby & cumcount, get pd.to_timedelta
in minutes and add to the existing start time:
repeats = df['Endtime'].sub(df['Starttime']).dt.total_seconds()//60
out = df.loc[df.index.repeat(repeats 1),['Lap','Starttime']]
out['Starttime'] = (out['Starttime'].add(
pd.to_timedelta(out.groupby("Lap").cumcount(),'min')).dt.time)
print(out)
Lap Starttime
0 1 10:00:00
0 1 10:01:00
0 1 10:02:00
0 1 10:03:00
0 1 10:04:00
0 1 10:05:00