There is a problem on my server where it became a bottle neck due to a specific problem to solve resolving a List<List<SomeObject>>
into a List<SomeObject>
. The CPU of the server spiked above normal means.
DataStructure is:
Object:
List<SomeObject> childList;
Trying to make a List<Object>
flatmapped to List<SomeObject>
in the most computationally efficient way.
If parentList = List<Object>
:
I Tried:
parentList.stream().flatMap(child -> child.getChildList().stream()).collect(Collectors.toList())
Also tried:
List<Object> all = new ArrayList<>();
parentList.forEach(child -> all.addAll(child.getChildList()))
Any other suggestions? These seem to be similar in computation but pretty high due to copying underneath the hood.
CodePudding user response:
This may be more efficient since it eliminates creating multiple streams via flatMap. MapMulti was introduced in Java 16. It takes the streamed argument
and a consumer
which puts something on the stream, in this case each list's object.
List<List<Object>> lists = new ArrayList<>(
List.of(List.of("1", "2", "3"),
List.of("4", "5", "6", "7"),
List.of("8", "9")));
List<Object> list = lists.stream().mapMulti(
(lst, consumer) -> lst.forEach(consumer))
.toList();
System.out.print(list);
prints
[1, 2, 3, 4, 5, 6, 7, 8, 9]
CodePudding user response:
Do we know more about which List implementation is used?
I would try to init the resulting list with the correct expected size. This avoids unnecessary copying. This assumes that the size of the lists can be retrieved fast.
int expectedSize = parentList.stream()
.mapToInt(entry -> entry.getChildList().size())
.sum();
List<SomeObject> result = new ArrayList<>(expectedSize);
for (var entry : parentList) {
result.addAll(entry.getChildList());
}
CodePudding user response:
In java 8
List<Object> listOne = new ArrayList<>();
List<Object> listTwo = new ArrayList<>();
List<Object> listThree = new ArrayList<>();
...
Stream.of(...)
concatenate many lists
List<Object> newList = Stream.of(listOne,listTwo,listThree).flatMap(Collection::stream).collect(Collectors.toList());
In Java 16
List<Object> newList=Stream.concat(Stream.concat(listOne, listTwo), listThree).toList();
Being an ETL (“Extract Transform and Load”) process, Streams
processes collections of data using multiple threads of execution at each stage of processing.
CodePudding user response:
One way to make the flat mapping more computationally efficient is to use a for loop instead of the stream API or forEach method. The for loop would iterate over the parent list, and for each element, it would add the child list to the flat list. This avoids the overhead of creating streams and using the collect method. Additionally, using an ArrayList to store the flat list instead of a LinkedList can also improve performance since it has a more efficient implementation of the addAll method.
List<SomeObject> flatList = new ArrayList<>();
for (Object o : parentList) {
flatList.addAll(o.getChildList());
Another way would be to use an iterator. Iterator is an interface for traversing a collection and it's more efficient than forEach or for loop.
List<SomeObject> flatList - new ArrayList<>();
Iterator<Object> iterator = parentList.iterator();
while(iterator.hasNext()){
Object o = iterator.next():
flatList.addAll(o.getChildList()):
}
You could also use the concat method for List, which concatenates two lists in an efficient way and results in a new list.
List<SomeObject> flatList = new ArrayList<>()
for (Object o : parentList){
flatList.concat(o.getChildList());
}
THERE ARE SERVERAL RESOURCES THAT YOU CAN USE FOR ADDITIONAL READING ON THIS TOPIC. HERE ARE A FEW THAT I WOULD RECOMMEND. https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/List.html
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/ArrayList.html
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/Iterator.html
https://www.oreilly.com/library/view/java-performance-the/9781449358652/
https://www.tutorialspoint.com/java_data_structure_algorithms/index.htm