I am generating data with a timestamp (counting up). I then want to seperate the array based on the timestamp and calculate the mean of the data in each window. My new array then has a new "timestamp" and the calculated mean data.
My Code is working as it is supposed to, but I do believe there is a more numpy-like way. I believe the while loop can be removed and np.where checking the whole array, as it is already sorted as-well.
Thanks for your help.
# generating test data, first row timestamps, always counting up and random data
data = np.array([np.cumsum(np.random.randint(100, size=20)), np.random.randint(1, 5, size=20)])
print(data)
window_size = 200
overlap = 100
i, l_lim, u_lim = 0, 0, window_size
timestamps = []
window_mean = []
while u_lim < data[0, -1]:
window_mean.append(np.mean(data[1, np.where((data[0, :] > l_lim) & (data[0, :] <= u_lim))]))
timestamps.append(i)
l_lim = u_lim - overlap
u_lim = l_lim window_size
i = 1
print(np.array([timestamps, window_mean]))
CodePudding user response:
While I may have reduced the number of lines of code, I do not think I have really improved it that much. The main difference is the method of iteration, and its use to define the number selection boundaries, but otherwise, I could not see any way to improve on your code. Here is my attempt for what it is worth:
Code:
import numpy as np
np.random.seed(5)
data = np.array([np.cumsum(np.random.randint(100, size=20)), np.random.randint(1, 5, size=20)])
print("Data:", data)
window_size = 200
overlap = 100
for i in range((max(data[0]) // (window_size-overlap)) 1):
result = np.mean(data[1, np.where((data[0] > i*(window_size-overlap)) & (data[0] <= (i*(window_size-overlap)) window_size))])
print(f"{i}: {result:.2f}")
Output:
Data: [[ 99 177 238 254 327 335 397 424 454 534 541 617 632 685 765 792 836 913 988 1053]
[ 4 3 1 3 2 3 3 2 2 3 2 2 3 2 3 4 1 3 2 3]]
0: 3.50
1: 2.33
2: 2.40
3: 2.40
4: 2.25
5: 2.40
6: 2.80
7: 2.67
8: 2.00
9: 2.67
10: 3.00