I'm struggling understanding the rule mentioned in Java docs (https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html) related to parallel stream stating that "The Stream.collect(Collector) implementation will only perform a concurrent reduction if: 1..2.. 3.Either the stream is unordered, or the collector has the Collector.Characteristics.UNORDERED characteristic." we have four cases:
- both are ordered which clearly will **not **take the advantage of concurrent reduction.
- both are unordered which clearly will take the advantage of concurrent reduction.
- stream is unordered and collector is ordered which also clearly will take the advantage of concurrent reduction
- stream is ordered and the collector is unordered which meets the third condition and supposed to take the advantage of concurrent reduction. however, what I understand is that the main responsibility to order the output of parallel threads to keep the stream ordered, is on the stream itself (the responsibility is on the stream to reorder the outputs of parallel stream threads) which means (in my understanding) that the performance will be affected by this step regardless the characteristic of the collector (ordered or nonordered). the stream will take the responsibility to reorder the outputs and reduce the performance, then, the collector will store the inputs come from the stream as received with no impact on the performance.
So, the third conditions or rule should be "both of them are unordered" or "stream is unordered". I know there is something wrong in my logic but I couldn't find any clarifications on the internet explaining this point. could any one, with thanks, explain what is wrong in my understanding?
I searched the internet for the answer, I tried asking chatGPT (it is very useful in studying btw) however, didn't get a satisfying answer.
CodePudding user response:
The collector is not an isolated part of the stream pipeline. The stream knows about the collector and can alter its behavior appropriately. Think of forEach()
and forEachOrdered()
. When called on a parallel stream (well, technically any stream), forEach()
makes no ordering guarantees, despite the orderedness of the stream itself. Conversely, forEachOrdered()
can't make any ordering guarantees if the stream itself has no encounter order. Only when using an ordered terminal operation combined with an ordered stream do the elements have to be processed sequentially. The same goes for collectors.
CodePudding user response:
the collector has the Collector.Characteristics.UNORDERED characteristic:
The description for this characteristic is
Indicates that the collection operation does not commit to preserving the encounter order of input elements. (This might be true if the result container has no intrinsic order, such as a Set.)
If the result of the collect operation has no intrinsic order the Stream.collect()
operation doesn't need to preserve any order present in the stream (since this order is lost in the result anyway) and it can therefore use a concurrent reduction (which will probably loose that order).