I have created this simple BST in Ruby:
class Node
attr_accessor :data, :left, :right
def initialize data
@data = data
end
end
def insert root, data
current = root
loop do
node = current.data < data ? current.right : current.left
break if node.nil?
current = node
end
if current.data < data
current.right = Node.new(data)
else
current.left = Node.new(data)
end
end
Note: I don't use recursion because I get a "stack too deep" exception when the BST becomes large.
Then I try to insert about 6M records:
root = Node.new('root')
insert root, "something"
insert root, "another"
# ...
It takes about 30 minutes to complete on my MacOS, which seems really too much!
Why it takes so long (with SQLite I can index in 30 seconds the same data)?
Can you suggest any optimization to the above code?
CodePudding user response:
Is Ruby slow?
Here's a StackOverflow question with highly detailed answers. In short, comparing Ruby's performance with SQLite's is pointless; only one of these tools is optimised to work with data.
Optimizing the approach
Since you're using a BST, each insertion takes O(n)
time in the worst case (which is a skewed tree). Inserting n
nodes would have O(n^2)
complexity, which might explain the high running time. You could replace the BST with a height balanced BST (also called AVL tree), where all insertion/deletion/search operations can be done in O(log n)
CodePudding user response:
I'm not sure if you strictly need a BST, but in case you can consider an alternative, having your data stored in an array and having it sorted (hopefully only once) would be way faster.
The following takes 2 seconds on my machine, whereas I stopped your BST implementation after 30 seconds.
data = []
6_000_000.times{|i|
data << rand(1_000_000)
}
# data << 3000 #force 3000 to be there...
data.sort! # only need to sort once
p data.bsearch{ |x| 3000 <=> x }
If you need to keep adding data, maybe re-sorting all the time would still be better.