Home > Net >  Subset df on specific timestamps and previous seconds - python
Subset df on specific timestamps and previous seconds - python

Time:10-28

I've got a df containing timestamps and separate values. The timestamps are recorded in ms (10 rows per second). I want to subset specific timepoints plus the previous rows within that second.

Using below, the timestamps have been returned. I then subtract a second of each and concat back to original df. However, I'm hoping to include all timepoints within a second only. Then skip to the next timestamp and all timepoints within that second.

df = pd.DataFrame({      
    'Time' : ['2021-03-20 09:27:28.400','2021-03-20 09:29:15.200','2021-03-20 09:30:38.200'],
    'Label' : ['A','B','A'],   
   })

df['Time'] = pd.to_datetime(df['Time'])

df_prev = df.copy()

df_prev['Time'] = df_prev['Time'] - pd.Timedelta('0.9sec')

df_prev = df_prev[['Time']]

df_out = pd.concat([df, df_prev]).sort_values(by = 'Time').reset_index(drop = True)

df_out = (df_out.set_index(['Time', df_out.groupby('Time').cumcount()])
            .unstack()
            .asfreq('0.1S', method = 'pad')
            .stack(dropna = False) 
            .reset_index(level = 1, drop = True)
            .reset_index()
            )

Intended output:

                      Time Label
1  2021-03-20 09:27:27.500   NaN
2  2021-03-20 09:27:27.600   NaN
3  2021-03-20 09:27:27.700   NaN
4  2021-03-20 09:27:27.800   NaN
5  2021-03-20 09:27:27.900   NaN
6  2021-03-20 09:27:28.000   NaN
7  2021-03-20 09:27:28.100   NaN
8  2021-03-20 09:27:28.200   NaN
9  2021-03-20 09:27:28.300   NaN
10 2021-03-20 09:27:28.400     A
11 2021-03-20 09:29:14.300   NaN
12 2021-03-20 09:29:14.400   NaN
13 2021-03-20 09:29:14.500   NaN
14 2021-03-20 09:29:14.600   NaN
15 2021-03-20 09:29:14.700   NaN
16 2021-03-20 09:29:14.800   NaN
17 2021-03-20 09:29:14.900   NaN
18 2021-03-20 09:29:14.000   NaN
19 2021-03-20 09:29:15.100   NaN
20 2021-03-20 09:29:15.200     B
21 2021-03-20 09:30:37.300   NaN
22 2021-03-20 09:30:37.400   NaN
23 2021-03-20 09:30:37.500   NaN
24 2021-03-20 09:30:37.600   NaN
25 2021-03-20 09:30:37.700   NaN
26 2021-03-20 09:30:37.800   NaN
27 2021-03-20 09:30:37.900   NaN
28 2021-03-20 09:30:38.000   NaN
29 2021-03-20 09:30:38.100   NaN
30 2021-03-20 09:30:38.200     A

CodePudding user response:

One way is to build a list of dates, and do an outer merge with the original df :

prev = df.Time - pd.Timedelta('900ms')

# build new dates
new_values = pd.concat(pd.date_range(start, end, 
                                     periods=10, 
                                     name = 'Time').to_series(index=None) 
                        for start, end in zip(prev, df.Time))

 new_values.index = range(len(new_values))

 df.merge(new_values, on='Time', how='outer', sort = True)
Out[286]:
                      Time Label
0  2021-03-20 09:27:27.500   NaN
1  2021-03-20 09:27:27.600   NaN
2  2021-03-20 09:27:27.700   NaN
3  2021-03-20 09:27:27.800   NaN
4  2021-03-20 09:27:27.900   NaN
5  2021-03-20 09:27:28.000   NaN
6  2021-03-20 09:27:28.100   NaN
7  2021-03-20 09:27:28.200   NaN
8  2021-03-20 09:27:28.300   NaN
9  2021-03-20 09:27:28.400     A
10 2021-03-20 09:29:14.300   NaN
11 2021-03-20 09:29:14.400   NaN
12 2021-03-20 09:29:14.500   NaN
13 2021-03-20 09:29:14.600   NaN
14 2021-03-20 09:29:14.700   NaN
15 2021-03-20 09:29:14.800   NaN
16 2021-03-20 09:29:14.900   NaN
17 2021-03-20 09:29:15.000   NaN
18 2021-03-20 09:29:15.100   NaN
19 2021-03-20 09:29:15.200     B
20 2021-03-20 09:30:37.300   NaN
21 2021-03-20 09:30:37.400   NaN
22 2021-03-20 09:30:37.500   NaN
23 2021-03-20 09:30:37.600   NaN
24 2021-03-20 09:30:37.700   NaN
25 2021-03-20 09:30:37.800   NaN
26 2021-03-20 09:30:37.900   NaN
27 2021-03-20 09:30:38.000   NaN
28 2021-03-20 09:30:38.100   NaN
29 2021-03-20 09:30:38.200     A
  • Related