I am trying to find all unique elements given a list of instances.
I have had a matching algorithm that determines if two items should treated identical
boolean isIdentical(MyItem a, MyItem b) {
//matching algorithm goes here
//return true if a and b are identical with some rules; false otherwise
}
What is the most efficient way to find all unique items?
List<MyItem> findUniqueItems(List<MyItem> allItems) {
}
Due to backward compatibility, I can't use java 8 but will have to find a more iterative way to achieve that.
CodePudding user response:
tl;dr
Given that you are constrained to versions of Java before Java 8…
Override equals
and hashCode
on your class, using same logic on both.
Instantiate a Set
such as HashSet
.
Add your objects one at a time, but first ask the set if it already contains that object. If so, you have a duplicate in hand.
Example
Here is some example code. I think this will run in Java 7, though I am using Java 17 today.
Here we define a class Employee
with three fields: id (UUID), name (String), and whenHired (LocalDate).
When overriding equals
and hashCode
, we use only the id
field with its UUID value. Study those two methods in this sample code.
package work.basil.comp;
import java.time.LocalDate;
import java.util.UUID;
public final class Employee
{
private final UUID id;
private final String name;
private final LocalDate whenHired;
public Employee ( UUID id , String name , LocalDate whenHired )
{
this.id = id;
this.name = name;
this.whenHired = whenHired;
}
public UUID id () { return id; }
public String name () { return name; }
public LocalDate whenHired () { return whenHired; }
@Override
public boolean equals ( Object obj )
{
// Match on `id` (UUID) field alone.
if ( obj == this ) return true;
if ( obj == null || obj.getClass() != this.getClass() ) return false;
var that = ( Employee ) obj;
return this.id.equals( that.id );
}
@Override
public int hashCode ()
{
// Hash `id` (UUID) field alone.
return this.id.hashCode();
}
@Override
public String toString ()
{
return "Employee["
"id=" id ", "
"name=" name ", "
"whenHired=" whenHired ']';
}
}
Make some example data.
Employee alice = new Employee( UUID.fromString( "5228256a-69c5-44d7-8ff5-40703f1b6491" ) , "Alice" , LocalDate.of( 2010 , Month.MARCH , 15 ) );
Employee bob = new Employee( UUID.fromString( "0ee605a2-24bf-4e35-ab68-54007c472717" ) , "Bob" , LocalDate.of( 2021 , Month.JANUARY , 23 ) );
Employee bobby = new Employee( UUID.fromString( "0ee605a2-24bf-4e35-ab68-54007c472717" ) , "Bobby" , LocalDate.of( 2021 , Month.JANUARY , 23 ) );
Employee carol = new Employee( UUID.fromString( "4b63c9ef-4921-46ff-a43a-60741bedf9a3" ) , "Carol" , LocalDate.of( 2019 , Month.DECEMBER , 1 ) );
List < Employee > inputs = new ArrayList <>();
inputs.add( alice );
inputs.add( bob );
inputs.add( bobby );
inputs.add( carol );
Create an empty Set
in which to put our Employee
objects. A set by definition forbids duplicates. And we can ask the set if it contains a particular object. The objects’ equals
method will be invoked to determine the answer to that "contains" query. And in a HashSet
, the objects’ hashCode
method will be invoked as part of the store-and-retrieve operations on the set.
Set < Employee > employees = new HashSet <>();
Loop our inputs (Employee
objects). For each one, ask if it is already contained in our set. If so, we have a duplicate. If not, add it.
for ( Employee input : inputs )
{
if ( employees.contains( input ) )
{
System.out.println( "duplicate = " input );
} else
{
employees.add( input );
}
}
When run.
duplicate = Employee[id=0ee605a2-24bf-4e35-ab68-54007c472717, name=Bobby, whenHired=2021-01-23]
If you do not care which items are duplicates, and just want to discard them, replace that for
loop with a single line, a call to Set#addAll
:
Set < Employee > employees = new HashSet <>();
employees.addAll( inputs ); // Any duplicates are ignored. A `Set` is distinct by definition.
CodePudding user response:
This:
List<MyItem> findUniqueItems(List<MyItem> allItems) {
List<MyItem> list = new ArrayList<>();
LinkedHashSet<MyItem> hashSet = new LinkedHashSet<>();
for(MyItem m: allItems){
if (hashSet.add(m)) list.add(m);
}
return list;
}
Or wrap the hashSet in a LinkedHashList. Of course you have to implement equals and hashCode.