Home > OS >  Preallocate or change size of vector
Preallocate or change size of vector

Time:05-10

I have a situation where I have a process which needs to "burn-in". This means that I

  1. Start with p values, p relatively small
  2. For n>p, generate nth value using most recently generated p values (e.g. p 1 generated from values 1 to p, p 2 generated from values 2, p 1, etc.)
  3. Repeat until n=N, where N large

Now, only the most recently generated p values will be useful to me, so there are two ways for me to implement this. I can either

  • Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value or,
  • Preallocate a large array of length N, where first p elements are initial values. At iteration n, mutate nth value with most recently generated value

There are pros and cons to both approaches.

Pros of the first, are that we only store most relevant values. Cons of the first are that we are changing the length of the vector at each iteration.

Pros of the second are that we preallocate all the memory we need. Cons of the second is that we store much more than we need.

What is the best way to proceed? Does it depend on what aspect of performance I most need to care about? What will be the quickest?

Cheers in advance.

edit: approximately, p is usually in the order of low tens, N can be several thousand

CodePudding user response:

The first solution has another huge cons: removing the first item of an array takes O(n) time since elements should be moved in memory. This certainly cause the algorithm to runs in quadratic time which is not reasonable. Shifting the items as proposed by @ForceBru should also cause this quadratic run time (since many items are moved just to add one value every time).

The second solution should be pretty fast compared to the first but, indeed, it can use a lot of memory so it should be sub-optimal (it takes time to write values in the RAM).

A faster solution is to use a data structure called a deque. Such data structure enable you to remove the first item in constant time and append a new value at the end also in constant time. That being said, it also introduces some overhead to be able to do that. Julia provide such data structure (more especially queues).

Since the number of in-flight items appears to be bounded in your algorithm, you can implement a rolling buffer. Fortunately, Julia also implement this: see CircularBuffer. This solution should be quite simple and fast (since the operations you want to do are done in O(1) time on it).

CodePudding user response:

It is probably simplest to use CircularArrays.jl for your use case:

julia> using CircularArrays

julia> c = CircularArray([1,2,3,4])
4-element CircularVector(::Vector{Int64}):
 1
 2
 3
 4

julia> for i in 5:10
           c[i] = i
           @show c
       end
c = [5, 2, 3, 4]
c = [5, 6, 3, 4]
c = [5, 6, 7, 4]
c = [5, 6, 7, 8]
c = [9, 6, 7, 8]
c = [9, 10, 7, 8]

In this way - as you can see - you can can continue using an increasing index and array will wrap around internally as needed (discarding old values that are not needed any more).

In this way you always store last p values in the array without having to copy anything or re-allocate memory in each step.

CodePudding user response:

...only the most recently generated p values will be useful to me...

Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value.

Cons of the first are that we are changing the length of the vector at each iteration.

There's no need to change the length of the vector. Simply shift its elements to the left (overwriting the first element) and write the new data to the_vector[end]:

the_vector = [1,2,3,4,5,6]

function shift_and_add!(vec::AbstractVector, value)
    vec[1:end-1] .= @view vec[2:end] # shift
    vec[end] = value # replace the last value
    vec
end

@assert shift_and_add!(the_vector, 80) == [2,3,4,5,6,80]
# `the_vector` will be mutated
@assert the_vector == [2,3,4,5,6,80]
  • Related