Home > Software engineering >  Java implement accumulator class that provides a Collector
Java implement accumulator class that provides a Collector

Time:11-12

A Collector has three generic types:

public interface Collector<T, A, R>

With A being the mutable accumulation type of the reduction operation (often hidden as an implementation detail).

If I want to create my custom collector, I need to create two classes:

  • one for the custom accumulation type
  • one for the custom collector itself

Is there any library function/trick that takes the accumulation type and provides a corresponding Collector?

Simple example

This example is extra simple to illustrate the question, I know I could use reduce for this case, but this is not what I am looking for. Here is a more complex example that sharing here would make the question too long, but it is the same idea.

Let's say I want to collect the sum of a stream and return it as a String.

I can implement my accumulator class:

public static class SumCollector {
   Integer value;

    public SumCollector(Integer value) {
        this.value = value;
    }

    public static SumCollector supply() {
        return new SumCollector(0);
    }

    public void accumulate(Integer next) {
       value  = next;
    }

    public SumCollector combine(SumCollector other) {
       return new SumCollector(value   other.value);
    }

    public String finish(){
        return Integer.toString(value);
    }
}

And then I can create a Collector from this class:

Collector.of(SumCollector::supply, SumCollector::accumulate, SumCollector::combine, SumCollector::finish);

But it seems strange to me that they all refer to the the other class, I feel that there is a more direct way to do this.

What I could do to keep only one class would be implements Collector<Integer, SumCollector, String> but then every function would be duplicated (supplier() would return SumCollector::supply, etc).

CodePudding user response:

There is no requirement for the functions to be implemented as methods of the container class.

This is how such a sum collector would be typically implemented

public static Collector<Integer, ?, Integer> sum() {
    return Collector.of(() -> new int[1],
        (a, i) -> a[0]  = i,
        (a, b) -> { a[0]  = b[0]; return a; },
        a -> a[0],
        Collector.Characteristics.UNORDERED);
}

But, of course, you could also implement it as

public static Collector<Integer, ?, Integer> sum() {
    return Collector.of(AtomicInteger::new,
        AtomicInteger::addAndGet,
        (a, b) -> { a.addAndGet(b.intValue()); return a; },
        AtomicInteger::intValue,
        Collector.Characteristics.UNORDERED, Collector.Characteristics.CONCURRENT);
}

You first have to find a suitable mutable container type for your collector. If no such type exists, you have to create your own class. The functions can be implemented as a method reference to an existing method or as a lambda expression.

For the more complex example, I don’t know of a suitable existing type for holding an int and a List, but you may get away with a boxed Integer, like this

final Map<String, Integer> map = …
List<String> keys = map.entrySet().stream().collect(keysToMaximum());
public static <K> Collector<Map.Entry<K,Integer>, ?, List<K>> keysToMaximum() {
    return Collector.of(
        () -> new AbstractMap.SimpleEntry<>(new ArrayList<K>(), Integer.MIN_VALUE),
        (current, next) -> {
            int max = current.getValue(), value = next.getValue();
            if(value >= max) {
                if(value > max) {
                    current.setValue(value);
                    current.getKey().clear();
                }
                current.getKey().add(next.getKey());
            }
        }, (a, b) -> {
            int maxA = a.getValue(), maxB = b.getValue();
            if(maxA <= maxB) return b;
            if(maxA == maxB) a.getKey().addAll(b.getKey());
            return a;
        },
        Map.Entry::getKey
    );
}

But you may also create a new dedicated container class as an ad-hoc type, not visible outside the particular collector

public static <K> Collector<Map.Entry<K,Integer>, ?, List<K>> keysToMaximum() {
    return Collector.of(() -> new Object() {
        int max = Integer.MIN_VALUE;
        final List<K> keys = new ArrayList<>();
    }, (current, next) -> {
        int value = next.getValue();
        if(value >= current.max) {
            if(value > current.max) {
                current.max = value;
                current.keys.clear();
            }
            current.keys.add(next.getKey());
        }
    }, (a, b) -> {
        if(a.max <= b.max) return b;
        if(a.max == b.max) a.keys.addAll(b.keys);
        return a;
    },
    a -> a.keys);
}

The takeaway is, you don’t need to create a new, named class to create a Collector.

CodePudding user response:

It sounds like you want to supply only the reduction function itself, not all of the other things that come with a generic Collector. Perhaps you're looking for Collectors.reducing.

public static <T> Collector<T,?,T> reducing(T identity, BinaryOperator<T> op)

Then, to sum values, you would write

Collectors.reducing(0, (x, y) -> x   y);

or, in context,

Integer[] myList = new Integer[] { 1, 2, 3, 4 };
var collector = Collectors.reducing(0, (x, y) -> x   y);
System.out.println(Stream.of(myList).collect(collector)); // Prints 10

CodePudding user response:

I want to focus the wording of one point of your question, because I feel like it could be the crux of the underlying confusion.

If I want to create my custom collector, I need to create two classes:

one for the custom accumulation type one for the custom collector itself

No, you need to create only one class, that of your custom accumulator. You should use the appropriate factory method to instantiate your custom Collector, as you demonstrate yourself in the question.

Perhaps you meant to say that you need to create two instances. And that is also incorrect; you need to create a Collector instance, but to support the general case, many instances of the accumulator can be created (e.g., groupingBy()). Thus, you can't simply instantiate the accumulator yourself, you need to provide its Supplier to the Collector, and delegate to the Collector the ability to instantiate as many instances as required.

Now, think about the overloaded Collectors.of() method you feel is missing, the "more direct way to do this." Clearly, such a method would still require a Supplier, one that would create instances of your custom accumulator. But Stream.collect() needs to interact with your custom accumulator instances, to perform accumulate and combine operations. So the Supplier would have to instantiate something like this Accumulator interface:

public interface Accumulator<T, A extends Accumulator<T, A, R>, R> {

    /**
     * @param t a value to be folded into this mutable result container
     */
    void accumulate(T t);

    /**
     * @param that another partial result to be merged with this container
     * @return the combined results, which may be {@code this}, {@code that}, or a new container
     */
    A combine(A that);

    /**
     * @return the final result of transforming this intermediate accumulator
     */
    R finish();

}

With that, it's then straightforward to create Collector instances from an Supplier<Accumulator>:

    static <T, A extends Accumulator<T, A, R>, R> 
    Collector<T, ?, R> of(Supplier<A> supplier, Collector.Characteristics ... characteristics) {
        return Collector.of(supplier, 
                            Accumulator::accumulate, 
                            Accumulator::combine, 
                            Accumulator::finish, 
                            characteristics);
    }

Then, you'd be able to define your custom Accumulator:

final class Sum implements Accumulator<Integer, Sum, String> {

    private int value;

    @Override
    public void accumulate(Integer next) {
        value  = next;
    }

    @Override
    public Sum combine(Sum that) {
        value  = that.value;
        return this;
    }

    @Override
    public String finish(){
        return Integer.toString(value);
    }

}

And use it:

String sum = ints.stream().collect(Accumulator.of(Sum::new, Collector.Characteristics.UNORDERED));

Now… it works, and there's nothing too horrible about it, but is all the Accumulator<A extends Accumulator<A>> mumbo-jumbo "more direct" than this?

final class Sum {

    private int value;

    private void accumulate(Integer next) {
        value  = next;
    }

    private Sum combine(Sum that) {
        value  = that.value;
        return this;
    }

    @Override
    public String toString() {
        return Integer.toString(value);
    }

    static Collector<Integer, ?, String> collector() {
        return Collector.of(Sum::new, Sum::accumulate, Sum::combine, Sum::toString, Collector.Characteristics.UNORDERED);
    }

}

And really, why have an Accumulator dedicated to collecting to a String? Wouldn't reduction to a custom type be more interesting? Something that along the lines of IntSummaryStatistics that has other useful methods like average() alongside toString()? This approach is a lot more powerful, requires only one (mutable) class (the result type) and can encapsulate all of its mutators as private methods rather than implementing a public interface.

So, you're welcome to use something like Accumulator, but it doesn't really fill a real gap in the core Collector repertoire.

  • Related