Home > database >  Efficient way to find unique instances with pre-defined rules (Java)
Efficient way to find unique instances with pre-defined rules (Java)

Time:09-24

I am trying to find all unique elements given a list of instances.

I have had a matching algorithm that determines if two items should treated identical

boolean isIdentical(MyItem a, MyItem b) {
//matching algorithm goes here
//return true if a and b are identical with some rules; false otherwise
}

What is the most efficient way to find all unique items?

List<MyItem> findUniqueItems(List<MyItem> allItems) {
}

Due to backward compatibility, I can't use java 8 but will have to find a more iterative way to achieve that.

CodePudding user response:

tl;dr

Given that you are constrained to versions of Java before Java 8…

Override equals and hashCode on your class, using same logic on both.

Instantiate a Set such as HashSet.

Add your objects one at a time, but first ask the set if it already contains that object. If so, you have a duplicate in hand.

Example

Here is some example code. I think this will run in Java 7, though I am using Java 17 today.

Here we define a class Employee with three fields: id (UUID), name (String), and whenHired (LocalDate).

When overriding equals and hashCode, we use only the id field with its UUID value. Study those two methods in this sample code.

package work.basil.comp;

import java.time.LocalDate;
import java.util.UUID;

public final class Employee
{
    private final UUID id;
    private final String name;
    private final LocalDate whenHired;

    public Employee ( UUID id , String name , LocalDate whenHired )
    {
        this.id = id;
        this.name = name;
        this.whenHired = whenHired;
    }

    public UUID id () { return id; }

    public String name () { return name; }

    public LocalDate whenHired () { return whenHired; }

    @Override
    public boolean equals ( Object obj )
    {
        // Match on `id` (UUID) field alone.
        if ( obj == this ) return true;
        if ( obj == null || obj.getClass() != this.getClass() ) return false;
        var that = ( Employee ) obj;
        return this.id.equals( that.id );
    }

    @Override
    public int hashCode ()
    {
        // Hash `id` (UUID) field alone.
        return this.id.hashCode();
    }

    @Override
    public String toString ()
    {
        return "Employee["  
                "id="   id   ", "  
                "name="   name   ", "  
                "whenHired="   whenHired   ']';
    }

}

Make some example data.

Employee alice = new Employee( UUID.fromString( "5228256a-69c5-44d7-8ff5-40703f1b6491" ) , "Alice" , LocalDate.of( 2010 , Month.MARCH , 15 ) );
Employee bob = new Employee( UUID.fromString( "0ee605a2-24bf-4e35-ab68-54007c472717" ) , "Bob" , LocalDate.of( 2021 , Month.JANUARY , 23 ) );
Employee bobby = new Employee( UUID.fromString( "0ee605a2-24bf-4e35-ab68-54007c472717" ) , "Bobby" , LocalDate.of( 2021 , Month.JANUARY , 23 ) );
Employee carol = new Employee( UUID.fromString( "4b63c9ef-4921-46ff-a43a-60741bedf9a3" ) , "Carol" , LocalDate.of( 2019 , Month.DECEMBER , 1 ) );
List < Employee > inputs = new ArrayList <>();
inputs.add( alice );
inputs.add( bob );
inputs.add( bobby );
inputs.add( carol );

Create an empty Set in which to put our Employee objects. A set by definition forbids duplicates. And we can ask the set if it contains a particular object. The objects’ equals method will be invoked to determine the answer to that "contains" query. And in a HashSet, the objects’ hashCode method will be invoked as part of the store-and-retrieve operations on the set.

Set < Employee > employees = new HashSet <>();

Loop our inputs (Employee objects). For each one, ask if it is already contained in our set. If so, we have a duplicate. If not, add it.

for ( Employee input : inputs )
{
    if ( employees.contains( input ) )
    {
        System.out.println( "duplicate = "   input );
    } else
    {
        employees.add( input );
    }
}

When run.

duplicate = Employee[id=0ee605a2-24bf-4e35-ab68-54007c472717, name=Bobby, whenHired=2021-01-23]

If you do not care which items are duplicates, and just want to discard them, replace that for loop with a single line, a call to Set#addAll:

Set < Employee > employees = new HashSet <>();
employees.addAll( inputs );  // Any duplicates are ignored. A `Set` is distinct by definition.

CodePudding user response:

This:

List<MyItem> findUniqueItems(List<MyItem> allItems) {
  List<MyItem> list = new ArrayList<>();
  LinkedHashSet<MyItem> hashSet = new LinkedHashSet<>();
  for(MyItem m: allItems){
      if (hashSet.add(m)) list.add(m);
  }
  return list;
}

Or wrap the hashSet in a LinkedHashList. Of course you have to implement equals and hashCode.

  • Related