I was writing some code and I noticed that both map and collect do the exact same thing so I am wondering when to use what and which is more efficient.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<Car> list = Arrays.asList(
new Car("Honda", "Civic", 2023),
new Car("Toyota", "Camry", 2023),
new Car("BMW", "M4", 2021));
List<String> nameList = list.stream().map(Car::brand).collect(Collectors.toList());
List<String> nameListCollect = list.stream().collect(ArrayList::new, (lists, car) -> lists.add(car.brand()), ArrayList::addAll);
}
}
record Car(String brand, String model, Integer year) {}
CodePudding user response:
So, the first thing to note is that map
and collect
do not do the same thing. This is actually clear from your example; both of your expressions have to call collect
, because if they didn't then you wouldn't end up with a List
.
Instead, what you've found is that if a call to map
is followed by something else, then you can often eliminate map
in favor of incorporating the mapper's logic into that something else. That something else doesn't have to be a call to collect
; it can even be another call to map
. For example, these two expressions are equivalent:
stream.map(Object::toString).map(String::toLowerCase)
stream.map(obj -> obj.toString().toLowerCase())
where the former uses two separate mappers in sequence, while the latter combines them into a single expression.
The important difference is not in performance, but in readability/clarity/maintainability — and those are a human judgment that depends on context. There are no hard-and-fast rules.
In your example, I think that .map(Car::brand).collect(Collectors.toList())
is infinitely clearer than .collect(ArrayList::new, (lists, car) -> lists.add(car.brand()), ArrayList::addAll)
; but if you think that the latter is clearer, then I can't really point to any objective facts that prove otherwise. (I suppose you could survey the uses of collect
on (say) GitHub to show that calls to collect(Collectors.toList())
are far more common than calls to the three-parameter overload of collect
; but that wouldn't really prove that the former is more readable, or more readable in your case.)
More broadly, I think the stream syntax works best when each step in the chain of method calls is very simple, and preferably written on a separate line. .map(Car::brand)
and .collect(Collectors.toList())
are both very simple and clear (if you're used to streams), whereas .collect(ArrayList::new, (lists, car) -> lists.add(car.brand()), ArrayList::addAll)
is a large irreducible chunk that you have to think about for a while.
And when it's not possible to break up the expression into a chain of simple calls like this, I think you're actually better off going back to simple imperative code (for-loops and so on) rather than using complicated stream calls. But again, if you disagree, I can't objectively prove that this is clearer. It just is!