I'm looking for the fastest way to merge two unsorted collections based on a common id
key.
Below O(N^2) implementation
for (Person per : pers) {
for (Data data : datas) {
if (per.getId().equals(data.getId())) {
per.getData().add(data);
}
}
}
I'm looking for the fastest possible way (and lowest memory footprint possible) to achieve this result, possibly O(N). Duplicates should be removed from per.getData(). For now, per.getData() is a HashSet
Any idea how this could be optimized ? I'm using java 11
CodePudding user response:
Do one pass over persons to collect into a map for later O(1) lookup, then do one pass over data adding it to person:
Map<Object, Person> people = pers.stream()
.collect(Collectors.toMap(Person::getId, p -> p));
datas.forEach(d -> people.get(d.getId()).add(d));
If it’s possible for a data to have a matching person, filter out unmatched data:
datas.stream()
.filter(d -> people.containsKey(d.getId()))
.forEach(d -> people.get(d.getId()).add(d));
Both ways are O(m n) (m people, n datas), because all map operations are O(1).
You mentioned that duplicates should be removed from person’s data. Being a HashSet (or any kind of Set), duplicates are automatically removed if equals()
and hashCode()
are coded properly for Data.
CodePudding user response:
Here's a linear approach (O(n)) that is better than O(n^2) but will use memory.
- Create a HashMap<personId, personObject> then loop on persons and insert them into the map.
- Loop on Datas and check if the dataId is present in the HashMap. If it exists, get the personObject and add the dataObject to its HashSet.
HashMap<Integer, Person> mp = new HashMap<>();
for (Person per : pers) {
mp.put(per.getId(), per);
}
for (Data data : datas) {
if (mp.get(data.getId()) != null) {
Person person = mp.get(data.getId());
person.getData().add(data);
mp.put(person.getId(), person);
}
}
Please note that I am assuming that you're using Integers as Ids. You can change the code to suit your case.