I've got a df containing timestamps and separate values. The timestamps are recorded in ms (10 rows per second). I want to subset specific timepoints plus the previous rows within that second.
Using below, the timestamps have been returned. I then subtract a second of each and concat back to original df. However, I'm hoping to include all timepoints within a second only. Then skip to the next timestamp and all timepoints within that second.
df = pd.DataFrame({
'Time' : ['2021-03-20 09:27:28.400','2021-03-20 09:29:15.200','2021-03-20 09:30:38.200'],
'Label' : ['A','B','A'],
})
df['Time'] = pd.to_datetime(df['Time'])
df_prev = df.copy()
df_prev['Time'] = df_prev['Time'] - pd.Timedelta('0.9sec')
df_prev = df_prev[['Time']]
df_out = pd.concat([df, df_prev]).sort_values(by = 'Time').reset_index(drop = True)
df_out = (df_out.set_index(['Time', df_out.groupby('Time').cumcount()])
.unstack()
.asfreq('0.1S', method = 'pad')
.stack(dropna = False)
.reset_index(level = 1, drop = True)
.reset_index()
)
Intended output:
Time Label
1 2021-03-20 09:27:27.500 NaN
2 2021-03-20 09:27:27.600 NaN
3 2021-03-20 09:27:27.700 NaN
4 2021-03-20 09:27:27.800 NaN
5 2021-03-20 09:27:27.900 NaN
6 2021-03-20 09:27:28.000 NaN
7 2021-03-20 09:27:28.100 NaN
8 2021-03-20 09:27:28.200 NaN
9 2021-03-20 09:27:28.300 NaN
10 2021-03-20 09:27:28.400 A
11 2021-03-20 09:29:14.300 NaN
12 2021-03-20 09:29:14.400 NaN
13 2021-03-20 09:29:14.500 NaN
14 2021-03-20 09:29:14.600 NaN
15 2021-03-20 09:29:14.700 NaN
16 2021-03-20 09:29:14.800 NaN
17 2021-03-20 09:29:14.900 NaN
18 2021-03-20 09:29:14.000 NaN
19 2021-03-20 09:29:15.100 NaN
20 2021-03-20 09:29:15.200 B
21 2021-03-20 09:30:37.300 NaN
22 2021-03-20 09:30:37.400 NaN
23 2021-03-20 09:30:37.500 NaN
24 2021-03-20 09:30:37.600 NaN
25 2021-03-20 09:30:37.700 NaN
26 2021-03-20 09:30:37.800 NaN
27 2021-03-20 09:30:37.900 NaN
28 2021-03-20 09:30:38.000 NaN
29 2021-03-20 09:30:38.100 NaN
30 2021-03-20 09:30:38.200 A
CodePudding user response:
One way is to build a list of dates, and do an outer merge with the original df
:
prev = df.Time - pd.Timedelta('900ms')
# build new dates
new_values = pd.concat(pd.date_range(start, end,
periods=10,
name = 'Time').to_series(index=None)
for start, end in zip(prev, df.Time))
new_values.index = range(len(new_values))
df.merge(new_values, on='Time', how='outer', sort = True)
Out[286]:
Time Label
0 2021-03-20 09:27:27.500 NaN
1 2021-03-20 09:27:27.600 NaN
2 2021-03-20 09:27:27.700 NaN
3 2021-03-20 09:27:27.800 NaN
4 2021-03-20 09:27:27.900 NaN
5 2021-03-20 09:27:28.000 NaN
6 2021-03-20 09:27:28.100 NaN
7 2021-03-20 09:27:28.200 NaN
8 2021-03-20 09:27:28.300 NaN
9 2021-03-20 09:27:28.400 A
10 2021-03-20 09:29:14.300 NaN
11 2021-03-20 09:29:14.400 NaN
12 2021-03-20 09:29:14.500 NaN
13 2021-03-20 09:29:14.600 NaN
14 2021-03-20 09:29:14.700 NaN
15 2021-03-20 09:29:14.800 NaN
16 2021-03-20 09:29:14.900 NaN
17 2021-03-20 09:29:15.000 NaN
18 2021-03-20 09:29:15.100 NaN
19 2021-03-20 09:29:15.200 B
20 2021-03-20 09:30:37.300 NaN
21 2021-03-20 09:30:37.400 NaN
22 2021-03-20 09:30:37.500 NaN
23 2021-03-20 09:30:37.600 NaN
24 2021-03-20 09:30:37.700 NaN
25 2021-03-20 09:30:37.800 NaN
26 2021-03-20 09:30:37.900 NaN
27 2021-03-20 09:30:38.000 NaN
28 2021-03-20 09:30:38.100 NaN
29 2021-03-20 09:30:38.200 A