We receive data from multiple hardware sources in a defined data structure. One of the fields in the datastructure is the timestamp which we will use to find the mean timestamp such that the variance is lowest.
source one timestamp X = [x1, x2, x3, x4, x5]
source two timestamp Y = [y1, y2, y3, y4, y5]
source three timestamp Z = [z1, z2, z3, z4, z5]
What I want to find is a time stamp per source which gives the smallest variance for any combination.
find x_i, y_j, z_k such that the variance(x_i, y_j, z_k) is the smallest for any i, j, k
Effectively what that means is I am searching for the three time stamps that are closest to each other
CodePudding user response:
If you were given the mean, then your task would be simple -- just choose timestamp from each array that is closest to the mean.
You are not given the mean, but imagine a value moving up the real line, and choosing the timestamps from each sorted array that are closest to it. Your choice from any array will only change at the midpoint between two adjacent values.
An O(n) algorithm for this problem is to start with the first elements in each array, and then repeatedly advance in the array with the smallest midpoint between the current element and the next. In this way you will consider all possible sets of closest elements to any point. In python, it looks like this:
import statistics
def sync(X,Y,Z):
arrays=[X,Y,Z]
cursors=[0,0,0]
bestvar = None
bestvals = None
while True:
vals = [(arrays[i][cursors[i]]) for i in range(len(arrays))]
variance = statistics.variance(vals)
if bestvar == None or variance < bestvar:
bestvar = variance
bestvals = vals
# find best array to advance in
nextmid = None
nextarray = None
for i in range(len(arrays)):
ar = arrays[i]
pos = cursors[i]
if len(ar) > pos 1:
mid = ar[pos] (ar[pos 1]-ar[pos])/2
if nextmid == None or mid < nextmid:
nextmid = mid
nextarray = i
if nextarray == None:
break
cursors[nextarray] = 1
return bestvals