i have a dataframe, namely data, with a datetime index and the below columns :
id activity x y z
datetime
1970-01-01 00:42:00.219142823 1623 A -0.152512 -8.585220 -1.219192
1970-01-01 00:42:00.269496827 1623 A 0.999466 -8.196548 -0.758926
1970-01-01 00:42:00.319850830 1623 A 0.450241 -8.701187 -1.290024
1970-01-01 00:42:00.370204834 1623 A -0.042175 -9.739563 -1.787415
1970-01-01 00:42:00.420558838 1623 A 3.551483 -10.745132 -1.266403
... ... ... ... ... ...
1970-01-22 01:26:29.872699000 1644 A 2.239343 -8.408914 2.074087
1970-01-22 01:26:29.892898000 1644 A 2.548301 -8.157437 1.820215
1970-01-22 01:26:29.912994000 1644 A 2.636917 -7.786209 2.057322
1970-01-22 01:26:29.933195000 1644 A 2.545906 -7.743098 1.801055
1970-01-22 01:26:29.953291000 1644 A 2.373464 -8.071217 1.585503
279817 rows × 5 columns
every 119 rows i want to extract only the values of x, y, z columns, as well with the activity label, and put them in a new dataframe row by row. the values of each column followed up by the next column. like below :
values of column x|values of column y|values of column z|activity
next row after 119 rows of values of the dataframe data
values of column x|values of column y|values of column z|activity
etc
any ideas would be very helpful and much appreciated.
Thanks in advance for your time!
CodePudding user response:
If your data frame is called data
, then you can use
data.iloc[::119, [data.columns.get_loc(col) for col in ['x', 'y', 'z', 'activity']]]
CodePudding user response:
Edit: after understanding that the question is really about getting data every 5 seconds, we can say so more directly:
wanted = ['x', 'y', 'z', 'activity']
newdf = df.resample('5s', origin='start').first()[wanted].copy()
Note: look at the various possibilities for the origin
argument. Using 'start'
means we start with the first index in df
. But usually it is more natural to used the default ('start_day'
) which makes "round datetimes".
Original answer:
newdf = df.iloc[::119][['x', 'y', 'z', 'activity']].copy()
Example (reproducible setup):
import numpy as np
np.random.seed(0)
n = 279817
df = pd.DataFrame({
'id': np.linspace(1623, 1644, n).round().astype(int),
'activity': ['A'] * n,
}, index=pd.date_range(start='1970-01-01 00:42:00', end='1970-01-22 01:26:30', periods=n))
df[list('xyz')] = np.random.normal(size=(n, 3))
# code above, then
>>> newdf
x y z activity
1970-01-01 00:42:00.000000000 1.764052 0.400157 0.978738 A
1970-01-01 00:54:52.762565400 -0.477974 -0.479656 0.620358 A
1970-01-01 01:07:45.525130800 1.327783 -0.101281 -0.803141 A
1970-01-01 01:20:38.287696200 -1.429991 -0.061638 -1.432735 A
1970-01-01 01:33:31.050261600 -1.109478 -0.547518 0.665967 A
... ... ... ... ...
1970-01-22 00:29:53.740994081 -0.267150 0.414945 -0.627917 A
1970-01-22 00:42:46.503559481 -0.001320 0.743854 0.137284 A
1970-01-22 00:55:39.266124882 -0.469361 -1.204360 -1.688222 A
1970-01-22 01:08:32.028690282 0.340768 -0.911023 0.069327 A
1970-01-22 01:21:24.791255682 -1.461045 1.938692 -0.188877 A
[2352 rows x 4 columns]