Suppose I am given integers a
and b
such that a range of interest is formed, of integers in [a,b]
. The range can span well over 10^9
integers. I want to sum the values of a given function f : N -> N
over all integers a <= n <= b
. The range is very large so I want to do this using multithreading.
Less formaly, I want to parallelize the following code:
long sum = 0;
for (long n = a ; n <= b ; n )
sum = f(n);
System.out.println(sum);
Ideally (at least in my mind), the range will divided equally across the available number of threads available by the processor (suppose f(n)
has near-identical complexity and running time for each n
in range). The values are completely independent, and f
could really be any function. For example, it could output the sum of digits of the number, but it really could be anything, it's just an example.
Is there a general way to do exactly that in Java using multithreading?
CodePudding user response:
This particular use-case fits very well for a parallell stream. You can implement it like this:
long sum = LongStream.rangeClosed(a, b)
.parallel()
.map(n -> f(n))
.sum();
System.out.println(sum);
CodePudding user response:
You probably want to look into the fork/join framework; it's a generalized approach to this principle of splitting up a task into a great many little independent tasks and then combining them back together, and gives you all control you'd want about how to spin off threads.
Alternatively, you can use the parallel()
method of the stream API, but note that this method doesn't explain or guarantee much about how it works. You have no control over which aspect is parallelized, and no control over the amount of threads that'll be involved. For trivial cases you might as well use it, but as a rule, if 'uhoh I better write this in parallel or it would be too slow' is relevant, then you need some guarantees and some control. Here's oracle's tutorial/explanation on parallel streams. For this specific case it does seem like it would more or less do what you want (it gets tricky if e.g. the stream you attempt to apply this to is e.g. what Files.lines
gives you - where the parallelism gets potentially stymied by where it is applied vs. where the bottleneck is).