I have a dataframe which contains starting and ending Timestamp acting as range for audio clipping, which can be generated like this:
import pandas as pd
df = pd.DataFrame(
{'start':
{0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'),2: pd.Timestamp('1900-01-01 00:00:18.743000'), 3: pd.Timestamp('1900-01-01 00:00:21.383000'), 4: pd.Timestamp('1900-01-01 00:00:22.812000')},
'end':
{0: pd.Timestamp('1900-01-01 00:00:16.342000'), 1: pd.Timestamp('1900-01-01 00:00:18.543000'), 2: pd.Timestamp('1900-01-01 00:00:20.712000'), 3: pd.Timestamp('1900-01-01 00:00:22.482000'), 4: pd.Timestamp('1900-01-01 00:00:24.653000')}})
start end
0 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
1 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
2 1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
3 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
4 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653
And I want to generate a dataframe filling starting and ending timestamp where these timestamp don't come, meaning ranges where these entries is not present. So something like this:
pd.DataFrame(
{'start':
{0: pd.Timestamp('1900-01-01 00:00:00.000000'), 1: pd.Timestamp('1900-01-01 00:00:14.373000'), 2: pd.Timestamp('1900-01-01 00:00:16.342000'), 3: pd.Timestamp('1900-01-01 00:00:18.543000'), 4: pd.Timestamp('1900-01-01 00:00:20.712000'), 5: pd.Timestamp('1900-01-01 00:00:21.383000'), 6: pd.Timestamp('1900-01-01 00:00:22.482000'), 7: pd.Timestamp('1900-01-01 00:00:22.812000')},
'end':
{0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'), 2: pd.Timestamp('1900-01-01 00:00:18.543000'), 3: pd.Timestamp('1900-01-01 00:00:20.712000'), 4: pd.Timestamp('1900-01-01 00:00:21.383000'), 5: pd.Timestamp('1900-01-01 00:00:22.482000'), 6: pd.Timestamp('1900-01-01 00:00:22.812000'), 7: pd.Timestamp('1900-01-01 00:00:24.653000')}})
start end
0 1900-01-01 00:00:00.000 1900-01-01 00:00:14.373
1 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3 1900-01-01 00:00:18.543 1900-01-01 00:00:20.712
4 1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
5 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
6 1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
7 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653
I couldn't come with any feasible solutions other than iterating individual rows, what could be the best way to do this?
CodePudding user response:
IIUC, you could get all unique timestamps and generate a new dataframe from the shifted values:
vals = df[['start', 'end']].stack().unique()
vals2 = np.concatenate([np.array([0], dtype=vals.dtype), vals])
df2 = pd.DataFrame(zip(vals2, vals), columns=['start', 'end'])
output:
start end
0 1970-01-01 00:00:00.000 1900-01-01 00:00:14.373
1 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3 1900-01-01 00:00:18.543 1900-01-01 00:00:18.743
4 1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
5 1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
6 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
7 1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
8 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653