If I have, lets say a vector with millions of elements and I keep being asked what is the sum of all elements of a slice of this vector, what data structure would be suitable for this task?
I could just iterate over the slice, but it seems like I'm wasting some previous calculations if I keep doing that all the time. If I calculate the sum for v[500_000..750_000]
and them calculate the sum of v[400_000..600_000]
there is an overlap between those, and it seems I could use some dynamic programing to reuse calculations.
CodePudding user response:
There're many ways to approach your problem, which arises very frequently in programming. The data structures below are roughly sort in difficulty in ascending order. (Earlier means easier)
- Prefix sum: Compute a[0..0] a[0..1], a[0..2], a[0..3], .., a[0..N-1], then you can get a[l...r] by a[0..r] - a[0..l - 1].
Complexity: O(N) preprocessing, O(1) query - Sparse table: Make a 2d array with dimensions st[log N][N], where st[i][j] stores the sum a[j...j 2^i-1] Complexity: O(NlogN) preprocessing, O(log N) query
- Segment tree: Essentially a binary tree where each vertex stores the sum of some segment of the array. E.g., the root stores a[0..N-1], the children of the root store a[0..(N-1)/2] and a[0..(N 1)/2] respectively, and the rest continues recursively.
However, the power of segment tree really lies in the ability to modify element. Using it for range sum query on immutable array is an overkill which introduces unnecessary memory usage.
Complexity: O(NlogN) preprocessing O(log N) query
In general prefix sum should be the way to go for unless you need other functionality later on, because it's the most intuitive and efficient in terms of both memory and runtime. All of these also incorporate the concept of dynamic programming inside.