- Rails v5.2.4.3
- Ruby v2.3.3
We have a Workspace table and a WorkspaceGroup table, and a many-to-many relationship between these two tables via a join table named WorkspaceGroupAssociation (a workspace is like a project in our domain model). So a project can belong to many groups, and a group can have many projects.
We have some groups which have many thousands of projects, and in our observability tooling, we noticed recently that the following old code was very slow (note that the below code is a simplified version of the method):
class WorkspaceGroup < ApplicationRecord
def add_workspaces(workspace_ids)
self.workspace_ids |= workspace_ids
end
end
We had one group which already had like 5,000 workspaces on it, and adding these new workspace IDs took upwards of 2 minutes.
Our initial approach was to change self.workspace_ids |= workspace_ids
to self.workspace_ids = workspace_ids
, but this didn't move the needle at all in terms of performance. Then we tried the following, and it worked great:
def add_workspaces(workspace_ids)
existing_workspaces = self.workspaces
workspaces_to_add = Workspace.where(id: workspace_ids) - existing_workspaces
workspaces_to_add.each do |workspace|
self.workspaces << workspace
end
end
The author of the above code said that the performance improvement was due to the fact that we aren't instantiating 5,000 new instances of the Workspace model in the new code, but we were in the old code.
I'm curious why that would be true of the old code, but not the new code. Why does self.workspace_ids =
result in instantiating thousands of new ActiveRecord instances, but self.workspaces <<
does not?
CodePudding user response:
does this for a collection in Rails
def (other)
Collection.new(to_a other.to_a)
end
While <<
does this...
def <<(*records)
proxy_association.concat(records) && self
end
My guess is that creating a new Collection
is a more expensive operation than doing a concatenation.
https://api.rubyonrails.org/classes/Rails/Initializable/Collection.html#method-i-2B