I have a table with time points (timestamp). If less than a minute has passed between the points, I need to sum the timedelta and update the point's datetime boundaries.
For the rows where 'time_break' is less than 60 I need the following:
- sum 'time_delta' with 'time_delta' from the previous row;
- update 'ts_start' with the value 'ts_start' from the previous row;
- calculate new 'time_break', so it's correct for updated time points
Input data:
df = pd.DataFrame({'ts_start': [1647854644, 1647855323, 1647855454, 1647855521, 1647858807, 1647858858, 1647858970],
'ts_end': [1647854699, 1647855421, 1647855521, 1647856205, 1647858810, 1647858958, 1647859020],
'time_break': [105.0, 624.0, 33.0, 0.0, 2602.0, 48.0, 12.0],
'time_delta': [55, 98, 67, 625, 3, 100, 50]})
Expected output:
df_out = pd.DataFrame({'ts_start': [1647854644, 1647855323, 1647858807],
'ts_end': [1647854699, 1647856205, 1647859020],
'time_break': [105.0, 624.0, 2602.0],
'time_delta': [55, 790, 153]})
But I understand it's working slowly and it misses doubled time_breaks. I think there's a possibility of using groupby, but I can't come up with a solution.
Would appreciate any help, thanks!
I've tried this:
for i in range(1, len(df)):
try:
if 0 <= df.iloc[i, df.columns.get_loc('time_break')] <= 60:
df.iloc[i, df.columns.get_loc('time_delta')] = df.iloc[i-1, df.columns.get_loc('time_delta')]
df.iloc[i, df.columns.get_loc('ts_start')] = df.iloc[i-1, df.columns.get_loc('ts_start')]
df.iloc[i, df.columns.get_loc('time_break')] = df.iloc[i, df.columns.get_loc('ts_start')] - df.iloc[i-2, df.columns.get_loc('ts_end')]
df.drop(index=i-1, inplace=True)
df = df.reset_index(drop=True)
except IndexError:
break;
My output: enter image description here
CodePudding user response:
Try:
x = (
df.groupby((df["time_break"] >= 60).cumsum())
.agg(
{
"ts_start": "first",
"ts_end": "last",
"time_break": "first",
"time_delta": "sum",
}
)
.reset_index(drop=True)
)
print(x)
Prints:
ts_start ts_end time_break time_delta
0 1647854644 1647854699 105.0 55
1 1647855323 1647856205 624.0 790
2 1647858807 1647859020 2602.0 153