Filling DataFrame with missing ranges present in two columns-CodePudding

I have a dataframe which contains starting and ending Timestamp acting as range for audio clipping, which can be generated like this:

import pandas as pd

df = pd.DataFrame( 
{'start': 
  {0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'),2: pd.Timestamp('1900-01-01 00:00:18.743000'), 3: pd.Timestamp('1900-01-01 00:00:21.383000'), 4: pd.Timestamp('1900-01-01 00:00:22.812000')}, 
'end': 
  {0: pd.Timestamp('1900-01-01 00:00:16.342000'), 1: pd.Timestamp('1900-01-01 00:00:18.543000'), 2: pd.Timestamp('1900-01-01 00:00:20.712000'), 3: pd.Timestamp('1900-01-01 00:00:22.482000'), 4: pd.Timestamp('1900-01-01 00:00:24.653000')}})

                      start                     end
0   1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
1   1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
2   1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
3   1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
4   1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

And I want to generate a dataframe filling starting and ending timestamp where these timestamp don't come, meaning ranges where these entries is not present. So something like this:

pd.DataFrame( 
{'start': 
  {0: pd.Timestamp('1900-01-01 00:00:00.000000'), 1: pd.Timestamp('1900-01-01 00:00:14.373000'), 2: pd.Timestamp('1900-01-01 00:00:16.342000'), 3: pd.Timestamp('1900-01-01 00:00:18.543000'), 4: pd.Timestamp('1900-01-01 00:00:20.712000'), 5: pd.Timestamp('1900-01-01 00:00:21.383000'), 6: pd.Timestamp('1900-01-01 00:00:22.482000'), 7: pd.Timestamp('1900-01-01 00:00:22.812000')}, 
'end': 
  {0: pd.Timestamp('1900-01-01 00:00:14.373000'), 1: pd.Timestamp('1900-01-01 00:00:16.342000'), 2: pd.Timestamp('1900-01-01 00:00:18.543000'), 3: pd.Timestamp('1900-01-01 00:00:20.712000'), 4: pd.Timestamp('1900-01-01 00:00:21.383000'), 5: pd.Timestamp('1900-01-01 00:00:22.482000'), 6: pd.Timestamp('1900-01-01 00:00:22.812000'), 7: pd.Timestamp('1900-01-01 00:00:24.653000')}})

                      start                     end
0   1900-01-01 00:00:00.000 1900-01-01 00:00:14.373
1   1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2   1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3   1900-01-01 00:00:18.543 1900-01-01 00:00:20.712
4   1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
5   1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
6   1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
7   1900-01-01 00:00:22.812 1900-01-01 00:00:24.653

I couldn't come with any feasible solutions other than iterating individual rows, what could be the best way to do this?

CodePudding user response：

IIUC, you could get all unique timestamps and generate a new dataframe from the shifted values:

vals = df[['start', 'end']].stack().unique()
vals2 = np.concatenate([np.array([0], dtype=vals.dtype), vals])
df2 = pd.DataFrame(zip(vals2, vals), columns=['start', 'end'])

output:

                    start                     end
0 1970-01-01 00:00:00.000 1900-01-01 00:00:14.373
1 1900-01-01 00:00:14.373 1900-01-01 00:00:16.342
2 1900-01-01 00:00:16.342 1900-01-01 00:00:18.543
3 1900-01-01 00:00:18.543 1900-01-01 00:00:18.743
4 1900-01-01 00:00:18.743 1900-01-01 00:00:20.712
5 1900-01-01 00:00:20.712 1900-01-01 00:00:21.383
6 1900-01-01 00:00:21.383 1900-01-01 00:00:22.482
7 1900-01-01 00:00:22.482 1900-01-01 00:00:22.812
8 1900-01-01 00:00:22.812 1900-01-01 00:00:24.653