Home > OS >  Fastest way to join 2 collections in Java based on a key
Fastest way to join 2 collections in Java based on a key

Time:12-08

I'm looking for the fastest way to merge two unsorted collections based on a common id key.

Below O(N^2) implementation

    for (Person per : pers) {
      for (Data data : datas) {
          if (per.getId().equals(data.getId())) {
              per.getData().add(data);
          }
        } 
      }

I'm looking for the fastest possible way (and lowest memory footprint possible) to achieve this result, possibly O(N). Duplicates should be removed from per.getData(). For now, per.getData() is a HashSet

Any idea how this could be optimized ? I'm using java 11

CodePudding user response:

Do one pass over persons to collect into a map for later O(1) lookup, then do one pass over data adding it to person:

Map<Object, Person> people = pers.stream()
  .collect(Collectors.toMap(Person::getId, p -> p));
datas.forEach(d -> people.get(d.getId()).add(d));

If it’s possible for a data to have a matching person, filter out unmatched data:

datas.stream()
  .filter(d -> people.containsKey(d.getId()))
  .forEach(d -> people.get(d.getId()).add(d));

Both ways are O(m n) (m people, n datas), because all map operations are O(1).

You mentioned that duplicates should be removed from person’s data. Being a HashSet (or any kind of Set), duplicates are automatically removed if equals() and hashCode() are coded properly for Data.

CodePudding user response:

Here's a linear approach (O(n)) that is better than O(n^2) but will use memory.

  • Create a HashMap<personId, personObject> then loop on persons and insert them into the map.
  • Loop on Datas and check if the dataId is present in the HashMap. If it exists, get the personObject and add the dataObject to its HashSet.
HashMap<Integer, Person> mp = new HashMap<>();
for (Person per : pers) {
   mp.put(per.getId(), per);
}
for (Data data : datas) {
   if (mp.get(data.getId()) != null) {
      Person person = mp.get(data.getId());
      person.getData().add(data);
      mp.put(person.getId(), person);
   }
}

Please note that I am assuming that you're using Integers as Ids. You can change the code to suit your case.

  • Related