I am looking for the most efficient way to converts a huge number of objects (1M instances) to another object type. Unfortunately I don't have the choice of what I am getting as an input (the million object).
So far I've tried with each_slice
but it does not show much improvement!
It looks like this:
expected_objects_of_type_2 = []
huge_array.each_slice(3000) do |batch|
batch.each do |object_type_1|
expected_objects_of_type_2 << NewType2.new(object_type_1)
end
end
Any idea?
Thanks!
CodePudding user response:
I did a quick test with a few different methods of looping the array and measured the timings:
huge_array = Array.new(10000000){rand(1..1000)}
a = Time.now
string_array = huge_array.map{|x| x.to_s}
b = Time.now
puts b-a
Same with:
sa = []
huge_array.each do |x|
sa << x.to_s
end
and
sa = []
huge_array.each_slice(3000) do |batch|
batch.each do |x|
sa << x.to_s
end
end
No idea what you are converting so I did a bit of simple int to string.
Timings
Map: 1.7
Each: 2.3
Slice: 3.2
So apparently your slice overhead makes things slower. Map seems to be the fastest (which is internally just a for loop but with a non-dynamic length array as output). The <<
seems to slow things down a bit.
So if each object needs an individual converting you are stuck with O(n) complexity and can't speed things up by a lot. Just avaid overhead.
Depending on your data, sorting and exploiting caching effects might help or avoiding duplicates if you have a lot of identical data but we have no way to know if we don't know your actual conversions.
CodePudding user response:
I would treat each slice in its own thread:
huge_array.each_slice(3000) do |batch|
Thread.new do
batch.each do |object_type_1|
expected_objects_of_type_2 << NewType2.new(object_type_1)
end
end
end
Then you have to wait for the threads to terminate using join
. They should be accumulated in an array and joined.