A Collector
has three generic types:
public interface Collector<T, A, R>
With A
being the mutable accumulation type of the reduction operation (often hidden as an implementation detail).
If I want to create my custom collector, I need to create two classes:
- one for the custom accumulation type
- one for the custom collector itself
Is there any library function/trick that takes the accumulation type and provides a corresponding Collector?
Simple example
This example is extra simple to illustrate the question, I know I could use reduce
for this case, but this is not what I am looking for. Here is a more complex example that sharing here would make the question too long, but it is the same idea.
Let's say I want to collect the sum of a stream and return it as a String
.
I can implement my accumulator class:
public static class SumCollector {
Integer value;
public SumCollector(Integer value) {
this.value = value;
}
public static SumCollector supply() {
return new SumCollector(0);
}
public void accumulate(Integer next) {
value = next;
}
public SumCollector combine(SumCollector other) {
return new SumCollector(value other.value);
}
public String finish(){
return Integer.toString(value);
}
}
And then I can create a Collector
from this class:
Collector.of(SumCollector::supply, SumCollector::accumulate, SumCollector::combine, SumCollector::finish);
But it seems strange to me that they all refer to the the other class, I feel that there is a more direct way to do this.
What I could do to keep only one class would be implements Collector<Integer, SumCollector, String>
but then every function would be duplicated (supplier()
would return SumCollector::supply
, etc).
CodePudding user response:
There is no requirement for the functions to be implemented as methods of the container class.
This is how such a sum collector would be typically implemented
public static Collector<Integer, ?, Integer> sum() {
return Collector.of(() -> new int[1],
(a, i) -> a[0] = i,
(a, b) -> { a[0] = b[0]; return a; },
a -> a[0],
Collector.Characteristics.UNORDERED);
}
But, of course, you could also implement it as
public static Collector<Integer, ?, Integer> sum() {
return Collector.of(AtomicInteger::new,
AtomicInteger::addAndGet,
(a, b) -> { a.addAndGet(b.intValue()); return a; },
AtomicInteger::intValue,
Collector.Characteristics.UNORDERED, Collector.Characteristics.CONCURRENT);
}
You first have to find a suitable mutable container type for your collector. If no such type exists, you have to create your own class. The functions can be implemented as a method reference to an existing method or as a lambda expression.
For the more complex example, I don’t know of a suitable existing type for holding an int
and a List
, but you may get away with a boxed Integer
, like this
final Map<String, Integer> map = …
List<String> keys = map.entrySet().stream().collect(keysToMaximum());
public static <K> Collector<Map.Entry<K,Integer>, ?, List<K>> keysToMaximum() {
return Collector.of(
() -> new AbstractMap.SimpleEntry<>(new ArrayList<K>(), Integer.MIN_VALUE),
(current, next) -> {
int max = current.getValue(), value = next.getValue();
if(value >= max) {
if(value > max) {
current.setValue(value);
current.getKey().clear();
}
current.getKey().add(next.getKey());
}
}, (a, b) -> {
int maxA = a.getValue(), maxB = b.getValue();
if(maxA <= maxB) return b;
if(maxA == maxB) a.getKey().addAll(b.getKey());
return a;
},
Map.Entry::getKey
);
}
But you may also create a new dedicated container class as an ad-hoc type, not visible outside the particular collector
public static <K> Collector<Map.Entry<K,Integer>, ?, List<K>> keysToMaximum() {
return Collector.of(() -> new Object() {
int max = Integer.MIN_VALUE;
final List<K> keys = new ArrayList<>();
}, (current, next) -> {
int value = next.getValue();
if(value >= current.max) {
if(value > current.max) {
current.max = value;
current.keys.clear();
}
current.keys.add(next.getKey());
}
}, (a, b) -> {
if(a.max <= b.max) return b;
if(a.max == b.max) a.keys.addAll(b.keys);
return a;
},
a -> a.keys);
}
The takeaway is, you don’t need to create a new, named class to create a Collector
.
CodePudding user response:
It sounds like you want to supply only the reduction function itself, not all of the other things that come with a generic Collector
. Perhaps you're looking for Collectors.reducing
.
public static <T> Collector<T,?,T> reducing(T identity, BinaryOperator<T> op)
Then, to sum values, you would write
Collectors.reducing(0, (x, y) -> x y);
or, in context,
Integer[] myList = new Integer[] { 1, 2, 3, 4 };
var collector = Collectors.reducing(0, (x, y) -> x y);
System.out.println(Stream.of(myList).collect(collector)); // Prints 10
CodePudding user response:
I want to focus the wording of one point of your question, because I feel like it could be the crux of the underlying confusion.
If I want to create my custom collector, I need to create two classes:
one for the custom accumulation type one for the custom collector itself
No, you need to create only one class, that of your custom accumulator. You should use the appropriate factory method to instantiate your custom Collector
, as you demonstrate yourself in the question.
Perhaps you meant to say that you need to create two instances. And that is also incorrect; you need to create a Collector
instance, but to support the general case, many instances of the accumulator can be created (e.g., groupingBy()
). Thus, you can't simply instantiate the accumulator yourself, you need to provide its Supplier
to the Collector
, and delegate to the Collector
the ability to instantiate as many instances as required.
Now, think about the overloaded Collectors.of()
method you feel is missing, the "more direct way to do this." Clearly, such a method would still require a Supplier
, one that would create instances of your custom accumulator. But Stream.collect()
needs to interact with your custom accumulator instances, to perform accumulate and combine operations. So the Supplier
would have to instantiate something like this Accumulator
interface:
public interface Accumulator<T, A extends Accumulator<T, A, R>, R> {
/**
* @param t a value to be folded into this mutable result container
*/
void accumulate(T t);
/**
* @param that another partial result to be merged with this container
* @return the combined results, which may be {@code this}, {@code that}, or a new container
*/
A combine(A that);
/**
* @return the final result of transforming this intermediate accumulator
*/
R finish();
}
With that, it's then straightforward to create Collector
instances from an Supplier<Accumulator>
:
static <T, A extends Accumulator<T, A, R>, R>
Collector<T, ?, R> of(Supplier<A> supplier, Collector.Characteristics ... characteristics) {
return Collector.of(supplier,
Accumulator::accumulate,
Accumulator::combine,
Accumulator::finish,
characteristics);
}
Then, you'd be able to define your custom Accumulator
:
final class Sum implements Accumulator<Integer, Sum, String> {
private int value;
@Override
public void accumulate(Integer next) {
value = next;
}
@Override
public Sum combine(Sum that) {
value = that.value;
return this;
}
@Override
public String finish(){
return Integer.toString(value);
}
}
And use it:
String sum = ints.stream().collect(Accumulator.of(Sum::new, Collector.Characteristics.UNORDERED));
Now… it works, and there's nothing too horrible about it, but is all the Accumulator<A extends Accumulator<A>>
mumbo-jumbo "more direct" than this?
final class Sum {
private int value;
private void accumulate(Integer next) {
value = next;
}
private Sum combine(Sum that) {
value = that.value;
return this;
}
@Override
public String toString() {
return Integer.toString(value);
}
static Collector<Integer, ?, String> collector() {
return Collector.of(Sum::new, Sum::accumulate, Sum::combine, Sum::toString, Collector.Characteristics.UNORDERED);
}
}
And really, why have an Accumulator
dedicated to collecting to a String
? Wouldn't reduction to a custom type be more interesting? Something that along the lines of IntSummaryStatistics
that has other useful methods like average()
alongside toString()
? This approach is a lot more powerful, requires only one (mutable) class (the result type) and can encapsulate all of its mutators as private methods rather than implementing a public interface.
So, you're welcome to use something like Accumulator
, but it doesn't really fill a real gap in the core Collector
repertoire.