I am trying to resample some time series data using the following code in python. What I am trying to do is take 2 arrays, one which has observation times and the other that has values. in the given example time
and values
. I want to record observations at particular time intervals, say every 0.1
in the time interval [0,2]
. that would give an array of [[time, values]]
that looks like [[0.0, 1], [1.0, 3], [1.2, 5], [1.3, 6], [2.0, 8]]
. now at every intermediate value between the next update I want to fill it with last value, for eg. between [0.0, 1], [1.0, 3]
i want pairs [0.1, 1] ..[0.9, 1]
.
import numpy as np
time = np.array([0, 0.0120, 0.0130, 1, 1.02, 1.2, 1.3, 1.32, 2 ])
values = np.array([1, 3, 2, 3, 4, 5, 6, 7, 8])
#time_interval = 1e-1
newvals = []
def resample(time, values, time_interval):
currentTime = 0.0
newvals = []
vals = []
for idx in range(len(time)):
t = time[idx]
data = values[idx]
if t >= currentTime:
newvals.append([max(t, currentTime),data])
currentTime = max(t, currentTime) time_interval
else:
continue
#vals = []
newvals_copy = np.array(newvals)
seen = set(newvals_copy[:,0])
for idx in range(len(newvals)):
rec_time = newvals[idx][0]
#seen.add(rec_time)
#print(idx, idx 1, len(newvals))
next_idx = idx 1
if next_idx == len(newvals):
break
else:
#print(idx)
next_time = newvals[idx 1][0]
intermediate_val = np.arange(rec_time , next_time, time_interval)
#print(intermediate_val[])
for value in intermediate_val:
if value not in seen:
#print(idx, value)
newvals.append([value, newvals[idx][1]])
newvals = np.array(newvals)
newvals=np.unique(newvals, axis=0)
return newvals#, len(newvals)
print(resample(time, values, 1e-1))
#print(np.arange(0,2.1,0.1), len(np.arange(0,2.1,0.1)))
this test case produces the desired output,
[[0. 1. ]
[0.1 1. ]
[0.2 1. ]
[0.3 1. ]
[0.4 1. ]
[0.5 1. ]
[0.6 1. ]
[0.7 1. ]
[0.8 1. ]
[0.9 1. ]
[1. 3. ]
[1.1 3. ]
[1.2 5. ]
[1.3 6. ]
[1.4 6. ]
[1.5 6. ]
[1.6 6. ]
[1.7 6. ]
[1.8 6. ]
[1.9 6. ]
[2. 8. ]]
however, running on real_data = resample(real_time, real_values, 1e-1)
where
real_time
= https://filedropper.com/d/s/7Q5IqtRzEh42p2oU0qTrbnuTb8iRK4 ,
real_values
= https://filedropper.com/d/s/hixCEY7QmqGcDBjEanhX2UgocKmV4K
real_data[:,0] = [0. 0.1 0.10000947 0.20000947 0.20000981 0.30000981
0.30001438 0.40001438 0.40002427 0.50002427 0.50006157 0.60006157
0.6000784 0.7000784 0.70009857 0.80009857 0.80010533 0.90010533
0.90012181 1.00012181 1.00020397 1.10020397 1.1002512 1.2002512
1.20025706 1.30025706 1.30026507 1.40026507 1.4002744 1.5002744
1.50028026 1.60028026 1.60029566 1.70029566 1.70029855 1.80029855
1.80030084 1.90030084 1.90032208]
which is incorrect. The expected output should be the same as the test case,
real_data[:,0]=[0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7
1.8 1.9 2. ]
I am guessing the error is due to how floating points/decimals are handled? but I am not sure how to remedy this.
EDIT
Tried rounding based on comments, better but not ideal
real_data = resample(np.round(real_time, decimals = 1), real_values, 1e-1)
still gives an array with duplicates real_data[:,0][0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.8 0.9 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. ]
Ideally I want to make this solution workable for arbitrary data sets and time interval values.
CodePudding user response:
Possibly something like this does what you need:
import numpy as np
from scipy.interpolate import interp1d
# The test data.
time = np.array([0, 0.0120, 0.0130, 1, 1.02, 1.2, 1.3, 1.32, 2])
values = np.array([1, 3, 2, 3, 4, 5, 6, 7, 8])
# The new time basis we're aiming for.
t_new = np.linspace(0, 2, 21)
# Throw away times that we don't like.
new_data = [[t, v] for t, v in zip(time, values) if t in t_new]
t_clean, v_clean = np.array(new_data).T
# Make the interpolator function.
func = interp1d(t_clean, v_clean, kind="previous")
# Interpolate the data into the new time basis.
v_new = func(t_new)
Now v_new
is like:
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 3., 3., 3., 6., 6., 6., 6.,
6., 6., 6., 8.])