I am really too confused with the equals()
and hashCode()
methods after reading lots of documentation and articles. Mainly, there are different kind of examples and usages that makes me too confused.
So, could you clarify me about the following points?
1. If there is not any unique field in an entity (except from id
field) then should we use getClass()
method or only id
field in the equals()
method as shown below?
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (getClass() != o.getClass()) return false;
// code omitted
}
2. If there is a unique key e.g. private String isbn;
, then should we use only this field? Or should we combine it with getClass()
as shown below?
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (getClass() != o.getClass()) return false;
Book book = (Book) o;
return isbn == book.isbn;
}
3. What about NaturalId
? As far as I understood, it is used for unique fields e.g. private String isbn;
. What is the purpose of its usage? Is it related to equals()
and hashCode()
methods?
CodePudding user response:
It all boils down to what your class actually represents, what is its identity and when should the JVM consider two objects as actually the same. The context in which the class is used determines its behavior (in this case - equality to another object).
By default Java considers two given objects "the same" only if they are actually the same instance of a class (comparison using ==
). While it makes sense in case of strictly technical verification, Java applications are usually used to represent a business domain, where multiple objects may be constructed, but they should still be considered the same. An example of that could be a book (as in your question). But what does it mean that a book is the same as another?
See - it depends.
When you ask someone if they read a certain book, you give them a title and the author, they try to "match" it agains the books they've read and see if any of them is equal to criteria you provided. So equals
in this case would be checking if the title and the author of a given book is the same as the other. Simple.
Now imagine that you're a Tolkien fan. If you were Polish (like me), you could have multiple "Lord of the Rings" translations available to read, but (as a fan) you would know about some translators that went a bit too far and you would like to avoid them. The title and the author is not enough, you're looking for a book with a certain ISBN identifier that will let you find a certain edition of the book. Since ISBN also contains information about the title and the author, it's not required to use them in the equals
method in this case.
The third (and final) book-related example is related to a library. Both situations described above could easily happen at a library, but from the librarian point of view books are also another thing: an "item". Each book in the library (it's just an assumption, I've never worked with such a system) has it's own identifier, which can be completely separate from the ISBN (but could also be an ISBN plus something extra). When you return a book in the library it's the library identifier that matters and it should be used in this case.
To sum up: a Book
as an abstraction does not have a single "equality definition". It depends on the context. Let's say we create such set of classes (most likely in more than one context):
Book
BookEdition
BookItem
BookOrder
(not yet in the library)
Book
and BookEdition
are more of a value object, while BookItem
and BookOrder
are entities. Value objects are represented only by their values and even though they do not have an identifier, they can be equal to other ones. Entities on the other hand can include values or can even consist of value objects (e.g. BookItem
could contain a BookEdition
field next to its libraryId
field), but they have an identifier which defines whether they are the same as another (even if their values change). Books are not a good example here (unless we imagine reassigning a library identifier to another book), but a user that changed their username is still the same user - identified by their ID.
In regard to checking the class of the object passed to the equals
method - it is highly advised (yet not enforced by the compiler in any way) to verify if the object is of given type before casting it to avoid a ClassCastException
. To do that instanceof
or getClass()
should be used. If the object fulfills the requirement of being of an expected type you can cast it (e.g. Book other = (Book) object;
) and only then can you access the properties of the book (libraryId, isbn, title, author) - an object of type Object
doesn't have such fields or accessors to them.
You're not explicitly asking about that in your question, but using instanceof
and getClass()
can be similarly unclear. A rule of thumb would be: use getClass()
as it helps to avoid problems with symmetry.
Natural IDs can vary depending on a context. In case of a BookEdition
an ISBN
is a natural ID, but in case of just a Book
it would be a pair of the title and the author (as a separate class). You can read more about the concept of natural ID in Hibernate in the docs.
It is important to understand that if you have a table in the database, it can be mapped to different types of objects in a more complex domain. ORM tools should help us with management and mapping of data, but the objects defined as data representation are (or rather: usually should be) a different layer of abstraction than the domain model.
Yet if you were forced to use, for example, the BookItem
as your data-modeling class, libraryId
could probably be an ID in the database context, but isbn
would not be a natural ID, since it does not uniquely identify the BookItem
. If BookEdition
was the data-modeling class, it could contain an ID autogenerated by the database (ID in the database context) and an ISBN, which in this case would be the natural ID as it uniquely identifies a BookEdition
in the book editions context.
To avoid such problems and make the code more flexible and descriptive, I'd suggest treating data as data and domain as domain, which is related to domain-driven design. A natural ID (as a concept) is present only on the domain level of the code as it can vary and evolve and you can still use the same database table to map the data into those various objects, depending on the business context.
Here's a code snippet with the classes described above and a class representing a table row from the database.
// getters and hashCode() omitted in all classes for simplicity
class Book {
private String title;
private String author;
@Override
public boolean equals(Object object) {
if (this == object) {
return true;
}
if (object == null || getClass() != object.getClass()) {
return false;
}
Book book = (Book) object;
// title and author identify the book
return title.equals(book.title)
&& author.equals(book.author);
}
static Book fromDatabaseRow(BookRow bookRow) {
var book = new Book();
book.title = bookRow.title;
book.author = bookRow.authorName " " bookRow.authorSurname;
return book;
}
}
class BookEdition {
private String title;
private String author;
private String isbn;
@Override
public boolean equals(Object object) {
if (this == object) {
return true;
}
if (object == null || getClass() != object.getClass()) {
return false;
}
BookEdition book = (BookEdition) object;
// isbn identifies the book edition
return isbn.equals(book.isbn);
}
static BookEdition fromDatabaseRow(BookRow bookRow) {
var edition = new BookEdition();
edition.title = bookRow.title;
edition.author = bookRow.authorName " " bookRow.authorSurname;
edition.isbn = bookRow.isbn;
return edition;
}
}
class BookItem {
private long libraryId;
private String title;
private String author;
private String isbn;
@Override
public boolean equals(Object object) {
if (this == object) {
return true;
}
if (object == null || getClass() != object.getClass()) {
return false;
}
BookItem book = (BookItem) object;
// libraryId identifies the book item in the library system
return libraryId == book.libraryId;
}
static BookItem fromDatabaseRow(BookRow bookRow) {
var item = new BookItem();
item.libraryId = bookRow.id;
item.title = bookRow.title;
item.author = bookRow.authorName " " bookRow.authorSurname;
item.isbn = bookRow.isbn;
return item;
}
}
// database table representation (represents data, is not a domain object)
class BookRow {
private long id;
private String isbn;
private String title;
// author should be a separate table joined by FK - done this way for simplification
private String authorName;
private String authorSurname;
// could have other fields as well - e.g. date of addition to the library
private Timestamp addedDate;
@Override
public boolean equals(Object object) {
if (this == object) {
return true;
}
if (object == null || getClass() != object.getClass()) {
return false;
}
BookRow book = (BookRow) object;
// id identifies the ORM entity (a row in the database table represented as a Java object)
return id == book.id;
}
}
CodePudding user response:
getClass()
In regard to the usage of getClass()
everything is straightforward.
Method equals()
expects an argument of type Object
.
It's important to ensure that you're dialing with an instance of the same class before performing casting and comparing attributes, otherwise you can end up with a ClassCastException
. And getClass()
can be used for that purpose, if objects do not belong to the same class they are clearly not equal.
Natural Id vs Surrogate Id
When you're talking about "NaturalId" like ISBN-number of a book versus "id", I guess you refer to a natural key of a persistence entity versus surrogate key which is used in a relational database.
There are different opinions on that point, the general recommended approach (see a link to the Hibernate user-guide and other references below) is to use natural id (a set of unique properties, also called business keys) in your application and ID which entity obtains after being persisted only in the database.
You can encounter hashCode()
and equals()
that are implemented based on surrogate id, and making a defensive null-check to guard against the case when an entity is in transient state and its id is null
. According to such implementations, a transient entity would not be equal to the entity in persistent state, having the same properties (apart from non-null id). Personally, I don't think this approach is correct.
The following code-sample has been taken from the most recent official Hibernate 6.1 User-Guide
Example 142. Natural Id
equals/hashCode
@Entity(name = "Book")
public static class Book {
@Id
@GeneratedValue
private Long id;
private String title;
private String author;
@NaturalId
private String isbn;
//Getters and setters are omitted for brevity
@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
Book book = (Book) o;
return Objects.equals(isbn, book.isbn);
}
@Override
public int hashCode() {
return Objects.hash(isbn);
}
}
The code provided above that makes use of business-keys is denoted in the guide as a final approach in contrast to implementation based on the surrogate keys, which is called a naive implementation (see Example 139
and further).
The same reasoning for the choice ID vs Natural key has been described here:
You have to override the equals() and hashCode() methods if you
intend to put instances of persistent classes in a Set (the recommended way to represent many-valued associations) and
intend to use reattachment of detached instances
Hibernate guarantees equivalence of persistent identity (database row) and Java identity only inside a particular session scope. So as soon as we mix instances retrieved in different sessions, we must implement equals() and hashCode() if we wish to have meaningful semantics for Sets.
The most obvious way is to implement equals()/hashCode() by comparing the identifier value of both objects. If the value is the same, both must be the same database row, they are therefore equal (if both are added to a Set, we will only have one element in the Set). Unfortunately, we can't use that approach with generated identifiers! Hibernate will only assign identifier values to objects that are persistent, a newly created instance will not have any identifier value! Furthermore, if an instance is unsaved and currently in a Set, saving it will assign an identifier value to the object. If equals() and hashCode() are based on the identifier value, the hash code would change, breaking the contract of the Set. See the Hibernate website for a full discussion of this problem. Note that this is not a Hibernate issue, but normal Java semantics of object identity and equality.
We recommend implementing equals() and hashCode() using Business key equality.
For more information, have a look at this recent (Sep 15, 2021) article by @Vlad Mihalcea on how to improve caching query results with natural keys The best way to map a @NaturalId business key with JPA and Hibernate, and these questions:
CodePudding user response:
- If there is not any unique field in an entity (except from id field) then should we use getClass() method or only id field in the equals() method as shown below?
@Override public boolean equals(Object o) { if (this == o) return true; if (getClass() != o.getClass()) return false; // code omitted }
we achieve two following goals when comparing classes in #equals implementation:
- thus we make sure that we do not compare apples with oranges (it could be correct though)
- the code you omitted must perform cast of
Object o
to some known class, otherwise we will unable to extract required information fromObject o
, so, we make #equals method safe - nobody expect to get ClassCastException when calling Set#add for example. Usinginstanceof
there seems not to be a good idea because it violates symmetric and transitive contracts ofequals
.
Also it is worth noticing that calling o.getClass()
could cause unexpected behaviour when Object o
is a proxy, some people prefer to either call Hibernate.getClass(o)
instead or implement other tricks.
I am really too confused with the equals() and hashCode() methods after reading lots of documentation and articles. Mainly, there are different kind of examples and usages that makes me too confused
- If there is a unique key e.g. private String isbn;, then should we use > only this field? Or should we combine it with getClass() as shown below?
@Override public boolean equals(Object o) { if (this == o) return true; if (getClass() != o.getClass()) return false; Book book = (Book) o; return isbn == book.isbn; }
That is very controversial topic, below are some thoughts on the problem:
- it is a good idea to maintain PK column for each DB table - it costs almost nothing, but simplifies a lot of things - imagine someone asked you to delete some rows and instead of
delete from tbl where id=...
you need to writedelete from tbl where field1=... and field2=... and ...
- PK's should not be composite, otherwise you might get surprised with queries like
select count(distinct field1, field2) from tbl
- the argument that entities get their IDs only when get stored in DB that is why we can't rely or surrogate ids in equals and hashCode is just wrong, yes, it is a common situation/behaviour for the most JPA projects, but you always has an option to generate and assign IDs manually, some examples below:
- EclipseLink UserGuide: "By default, the entities Id must be set by the application, normally before the persist is called. A @GeneratedValue can be used to have EclipseLink generate the Id value." - I believe it is clear enough that
@GeneratedValue
is just an extra feature and nobody prevents you from creating own object factory. - Hibernate User Guide: "Values for simple identifiers can be assigned, which simply means that the application itself will assign the value to the identifier attribute prior to persisting the entity."
- some popular persistent storages (Cassandra, MongoDB) do not have out-of-the-box auto-increment functionality, however nobody may say those storages do not allow to implement some high level ideas like DDD, etc.
- EclipseLink UserGuide: "By default, the entities Id must be set by the application, normally before the persist is called. A @GeneratedValue can be used to have EclipseLink generate the Id value." - I believe it is clear enough that
- in such discussions examples make sense but book/author/isbn is not the good one, below are something more practical: my db contains about 1000 tables, and just 3 of them contains something similar to natural id, please give me the reason why I should not use surrogate ids there
- it is not always possible to use natural ids even when they exist, some examples:
- bank card PAN - it seems to be unique, however you must not even store it in DB (I believe SSN, VIN are also security sensitive)
- no matter what anyone says, thinking that natural ids never change is too naive, surrogate ids never change
- they may have bad format: too long, case insensitive, contains unsafe symbols, etc
- it is not possible to implement
soft deletes
feature when we are using natural ids
PS. Vlad Mihalcea had provided amusing implementation of hashCode:
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Book))
return false;
Book other = (Book) o;
return id != null &&
id.equals(other.getId());
}
@Override
public int hashCode() {
return getClass().hashCode();
}
In regard to HBN documentation, the problem is their synthetic cases have nothing in common with the real world. Let's consider their dummy author/book model and try to extend it... Imagine I'm a publisher and I want to keep records of my authors, their books and drafts. What is the difference between book
and draft
? Book has isbn assigned, draft has not, but draft may one time become a book (or may not). How to keep java equals/hashCode contracts for drafts in such case?