I am developing an API where I am confused as to what is the efficient way to handle join query. I want to join 2 tables data and return the response. Either I can query the database with join query and fetch the result and then return the response OR I can fire two separate queries and then I would handle the join in the API on the fly and return the response. Which is the efficient and correct way ?
CodePudding user response:
Databases are pretty much faster than querying and joining as class instances. Always do joins in the database and map them from the code. Also look for any lazy loading if possible. Cause in a situation like below:
@Entity
@Table(name = "USER")
public class UserLazy implements Serializable {
@Id
@GeneratedValue
@Column(name = "USER_ID")
private Long userId;
@OneToMany(fetch = FetchType.LAZY, mappedBy = "user")
private Set<OrderDetail> orderDetail = new HashSet();
// standard setters and getters
// also override equals and hashcode
}
you might not want order details when you want the initial results.
CodePudding user response:
Usually it's more efficient to do the join in the database, but there are some corner cases, mostly due to the fact that application CPU time is cheaper than database CPU time. Here are a few examples that come to mind, with a query like "table A join table B":
- B is a small table that rarely changes.
In this case it can be profitable to cache the contents of this table in the application, and not query it at all.
- Rows in A are quite large, and many rows of B are selected for each row of A.
This will cause useless network traffic and load as rows from A are duplicated many times in each result row.
- Rows in B are quite large, and there are few distinct b_id's in A
Same as above, except this time the same few rows from B are duplicated in the result set.
In the previous two examples, it could be useful to perform the query on table A, then gather a set of unique b_id's from the result, and SELECT FROM b WHERE b_id IN (list).
- Data structure and ORMs
If each table contains a different object type, and they have a "belongs to" relationship (like category and product) and you use an ORM which will instantiate objects for each selected row, then perhaps you only want one instance of each category, and not one per selected product. In this case, you could select the products, gather a list of unique category_ids, and select the categories from there. The ORM may even do that for you behind the scene.
- Complicated aggregates
Sometimes, you want some stuff, and some aggregates of other stuff related to the first stuff, but it just won't fit in a neat GROUP BY, or you may need several ones.
So basically, usually the join works better in the database, so that should be the default. If you do it in the application, then you should know why you're doing it, and decide it's a good reason. If it is, then fine. I gave a few reasons, based on performance, data model, and SQL constraints, these are only examples of course.