Let's assume I have an infinite stream of floats s_i
. I want to calculate the average of the last 10000 values.
There is an easy and inefficient solution to this problem. Just store the last 10000 values in a list/queue and calculate the average by iterating over all elements every time. Which results in a runtime of O(n) where n is the length of the list. This solution is too inefficient for my use case.
The fast solution to this problem is to introduce a variable that holds the sum of the last 10000 elements. Every time a new element arrives it will be added to the variable and the oldest (stored in a queue) will be subtracted from the variable. This solution is O(1). (Yes memory will be still O(n) but memory is not an issue at all)
However, there is a problem with the "Fast Solution". Due to the floating point imprecision and rounding errors, the sum is drifting and getting less accurate over time. This MVCE in java shows the problem:
float sum = 0;
List<Float> nums = new ArrayList<>();
for(int i = 1; i<=10000; i ) {
nums.add((float) (Math.random()));
}
for(float f: nums) {
sum = f;
}
for(float f: nums) {
sum -= f;
}
System.out.println(sum); //Not printing Zero
Please note that using a double instead won't solve this problem just make it less significant.
So what is an elegant solution to this problem?
CodePudding user response:
Here are two solutions. At the bottom is code demonstrating an O(1) per element solution in which each output is computed only by summing elements from its dependent inputs. It has no rounding errors accumulated from ongoing sums. (This means it can be used generally for any reduction operation, such as finding maximums or minimums, not just addition.)
The float
format spans 276 bits, from 2−149 (the least significant bit of a subnormal number) to 2 126 (the most significant bit [MSB] of the format’s largest finite number). 10,000 times the largest finite number has an MSB of of 2140, so the sum spans at most 140 − −149 1 = 290 bits. Thus, with an array of 290 bits, plus a sign bit, you can track the exact sum of 10,000 floating-point numbers, with no rounding errors.
(Note that the implementation can include some slack to avoid excessive carrying. For example, if 290 bits were implemented as, say, an array of ten 32-bit elements, fully using the first nine elements and 2 bits of the last element (plus another for the sign), then, in the worst case, a carry from the low element could cascade across all higher elements. Instead, one might use, say, twenty 32-bit elements but primarily use only around 16 bits of each element, allowing carries to accumulate within that element instead of being added to the higher element. Reconciling the slack would only be necessary when it has accumulated so much as to need actual carrying or when checking the high elements to read the current total.)
A second method can provide an O(N) complexity for N elements, so O(1) per element (for a fixed window size). For this, let’s start with the first 10,000 elements in an array I
from I[0]
to I[9999]
. Accumulate partial sums working backwards. Given some array T
(this can be a temporary array, or the work can be done in the output array), set T[9999]
= I[9999]
, T[9998] = T[9999] I[9998]
and so on—generally, T[i] = T[i 1] I[i]
. When we are done, T[0]
will contain the sum of the first 10,000 inputs, so it is our first output.
Now note that T[1]
is the sum of the 9,999 inputs after the first, so T[1] I[10000]
is the second output. In fact, each T[i]
computed so far is the prefix to a sum we desire; it is the sum of the initial 10000−i
elements for the sum that is output i
. So what we want to complete these are matching suffixes. Each prefix computed so far ends at 9,999; the last (highest-indexed) input element added to it was I[9999]
. So, if we start a partial sum S
with I[10000]
, it is the suffix we need to add to T[1]
. Then adding I[10001]
to S
gives us the suffix we need to add to T[2]
, and so on. So each of the 9,999 outputs, after the first can be computed by updating S
with the new input and outputting the sum of S
and T[i]
. One final output from this block can be produced by updating S
with another input and outputting it—at that point S
is the suffix sum of 10,000 elements and needs a prefix of zero elements, so no addition with any T[i]
is required.
Thus 10,0001 outputs have been produced by the 9,999 additions used to compute the prefixes and the 9,999 additions used to update S, so just under two additions per output. Then this process can be repeated for another block of 10,000 values.
The outputs may still have rounding errors from floating-point arithmetic, but each is constructed solely from finitely many “unidirectional” additions involving solely its inputs. There are no subtractions of earlier elements from an ongoing sum that accumulates more rounding errors over time.
Here is demonstration code.
#include <stdlib.h>
#define Operation(a, b) ((a) (b))
/* Sliding window reduction.
For 0 <= i < N, each C[i] is the result of a reduction operation of the W
elements of A from A[i] to A[i W-1]. The Operation macro defines the
reduction operation.
*/
void SlidingWindowReduction(const float *A, float *C, size_t N, size_t W)
{
// Do whole segments of length W 1.
for (; W < N; N -= W 1, A = W 1, C = W 1)
{
// Build prefixes. C[W-w] receives Operation(A[W-w] to A[W-1]).
float p = A[W-1];
for (size_t w = 1; w < W; w)
{
C[W-w] = p;
float t = A[W-1-w];
p = Operation(p, t);
}
C[0] = p;
// Build suffixes and paste them onto prefixes.
// Each suffix is Operation(C[W] to C[W w]).
float s = A[W];
for (size_t w = 1; w < W; w)
{
// Paste prefix (w to W-1) with suffix (W to W w-1).
float p = C[w];
C[w] = Operation(p, s);
// Build suffix.
float t = A[W w];
s = Operation(s, t);
}
C[W] = s;
}
// Do final segment.
if (N)
{
// Build prefixes. C[W-w] receives Operation(A[W-w] to A[W-1]).
float p = A[W-1];
for (size_t w = 1; w < W-N 1; w) // Unstored prefixes.
{
float t = A[W-1-w];
p = Operation(p, t);
}
for (size_t w = W-N 1; w < W; w) // Stored prefixes.
{
C[W-w] = p;
float t = A[W-1-w];
p = Operation(p, t);
}
C[0] = p;
// Build suffixes and paste them onto prefixes.
// Each suffix is Operation(C[W] to C[W w]).
if (1 < N)
{
float s = A[W];
for (size_t w = 1; w < N-1; w)
{
// Paste prefix (w to W-1) with suffix (W to W w-1).
float p = C[w];
C[w] = Operation(p, s);
// Build suffix.
float t = A[W w];
s = Operation(s, t);
}
{
float p = C[N-1];
C[N-1] = Operation(p, s);
}
}
}
}
#include <stdio.h>
#define NumberOf(a) (sizeof (a) / sizeof *(a))
int main(void)
{
enum { W = 3 }; // Number of elements in window.
float A[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
float C[NumberOf(A) - W 1] = { 0 };
SlidingWindowReduction(A, C, NumberOf(C), 3);
for (size_t c = 0; c < NumberOf(C); c)
printf("C[%zu] = %g.\n", c, C[c]);
}
CodePudding user response:
You cannot fix the limited precision of floating point arithmetic. To avoid drifting have two ideas:
Do the fast version and ignore the drifting. After a fixed amount of numbers make the effort to calculate the hole sum new O(n). This way you get O(1) most times but sometimes O(n). I know that this makes a O(n) algorithm but with a very small factor
Convert the float to a format that has no arithmetic problems. Like a fixed-point representation (may only be applicable if your values are in a limited range)
CodePudding user response:
Have you tried BigDecimal? It will keep the maximum scale after add/subtract.
BigDecimal sum = BigDecimal.ZERO;
List<BigDecimal> nums = new ArrayList<>();
for (int i = 1; i <= 10000; i ) {
nums.add(new BigDecimal(Math.random()));
}
for (BigDecimal f : nums) {
sum = sum.add(f);
}
for (BigDecimal f : nums) {
sum = sum.subtract(f);
}
System.out.println(sum.toPlainString());