Home > OS >  How Stream.reduce(identity, accumulator, combiner) works?
How Stream.reduce(identity, accumulator, combiner) works?

Time:01-21

Stream.reduce has 3 method overloads.

reduce(BinaryOperator<T> accumulator)
reduce(T identity, BinaryOperator<T> accumulator)
reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)
  • 1st overload can be used to calculate sum of integer list for example.
  • 2nd overload is the same but if the list is empty it just returns the default value.

I'm having a hard time understanding how third overload (Stream.reduce(identity, accumulator, combiner)) works and what is a use case of that. So, how does it work, and why does that exists?

CodePudding user response:

If I understand correctly, your question is about the third argument combiner.

Firstly, one of the goals of Java was to have similar APIs for sequential and parallel streams. The 3-argument version of reduce is useful for parallel streams.

Suppose you are reducing from value of Collection<T> to U type and you are using parallel stream versions. The parallel stream splits the collection T into smaller streams and generates a u' value for each by using the second function. But now these different u' values have to be combined ? How do they get combined ? The third function is the one that provides that logic.

CodePudding user response:

Note: Some of the examples are contrived for demonstration. In some instances a simple .sum() could have been used.

The big difference, imo, is that the third form has a BiFunction as a second argument instead of a BinaryOperator. So you can use the third form to change the result type. It also has a BinaryOperator as a combiner to combine the different results from parallel operations.

Generate some data

record Data(String name, int value) {}

Random r = new Random();
List<Data> dataList = r.ints(1000, 1, 20).mapToObj(i->new Data("Item" i, i)).toList();

No parallel operation but different types. But the third argument is required so just return the sum.

int sum = dataList.stream().reduce(0, (item, data) -> item   data.value,
        (finalSum, partialSum) -> finalSum);
System.out.println(sum);

prints

10162

The second form. Use map to get the value to be summed. BinaryOperator used here since types are the same and no parallel operation.

sum = dataList.stream().map(Data::value).reduce(0, (sum1,val)->sum1 val);
System.out.println(sum); // print same as above

This shows the same as above but in parallel. The third argument accumulates partial sums. And those sums are accumulated as the next thread finishes so there may not be a sensible order to the output.

sum = dataList.parallelStream().reduce(0, (sum1, data) -> sum1   data.value,
        (finalSum, partialSum) -> {
           
            System.out.println("Adding "   partialSum   " to "   finalSum);
            finalSum  = partialSum;
            return finalSum;
        });
System.out.println(sum);

prints something like the following

Adding 586 to 670
Adding 567 to 553
Adding 1256 to 1120
Adding 715 to 620
Adding 624 to 601
Adding 1335 to 1225
Adding 2560 to 2376
Adding 662 to 579
Adding 706 to 715
Adding 1421 to 1241
Adding 713 to 689
Adding 576 to 586
Adding 1402 to 1162
Adding 2662 to 2564
Adding 4936 to 5226
10162

One final note. None of the Collectors.reducing methods have a BiFunction to handle different types. To handle this the second argument is a Function to act as a mapper so the third argument, a BinaryOperator can collect the values.

sum = dataList.parallelStream().collect(
       Collectors.reducing(0, Data::value, (finalSum, partialSum) -> {
           System.out.println(
                   "Adding "   partialSum   " to "   finalSum);
           finalSum  = partialSum;
           return finalSum;
       }));

System.out.println(sum);

It is interesting to note that for parallel operation the Collector version generates more threads with smaller chunks than the inline version.

CodePudding user response:

That is to put things in add container.

An example would be as follows:

  • identity = new ArrayList<>()
  • accumulator = (list, element) -> { list.add(element); return list; }
  • combiner = (listA, listB) -> { listA.addAll(listB); return listA; }

CodePudding user response:

Basically it combines a mapping function with a reduction. Most of the examples I've seen for this don't really demonstrate why it's preferrable to calling map() and a normal reduce() in separate steps. The API Note comes in handy here:

Many reductions using this form can be represented more simply by an explicit combination of map and reduce operations. The accumulator function acts as a fused mapper and accumulator, which can sometimes be more efficient than separate mapping and reduction, such as when knowing the previously reduced value allows you to avoid some computation.

So let's say we have a Stream<String> numbers, and we want to parse them to BigDecimal and calculate their product. We could do something like this:

BigDecimal product = numbers.map(BigDecimal::new)
        .reduce(BigDecimal.ONE, BigDecimal::multiply);

But this has an inefficiency. If one of the numbers is "0", we're wasting cycles converting the remainder to BigDecimal. We can use the 3-arg reduce() here to bypass the mapping logic:

BigDecimal product = numbers.reduce(BigDecimal.ONE,
        (d, n) -> d.equals(BigDecimal.ZERO) ? BigDecimal.ZERO : new BigDecimal(n).multiply(d),
        BigDecimal::multiply);

Of course it would be even more efficient to short-circuit the stream entirely, but that's tricky to do in a stream, especially in parallel. And this is just an example to get the concept across.

  • Related