Home > Software design >  Search time when storing hashset of all cases of SHA1 in database
Search time when storing hashset of all cases of SHA1 in database

Time:10-11

I am researching for sha1 collision.

The size of the hashset in all cases of sha1 is 2^160. Assuming you put it in the aws database, how long will it take to retrieve the record? Is there an area defined as how many seconds it takes to retrieve a record in mysql or mssql documentation? Also, what database type of aws do you think is good for a huge amount of 2^160?

Thanks in advance for your reply.

CodePudding user response:

There is no way to store that much data, on aws or anywhere else. Nor is it possible to generate that much data in any conceivable amount of time.

CodePudding user response:

To retrieve one row from any suitably indexed table will take "a few" milliseconds.

Said another way, a single program can fetch hundreds, maybe thousands, of row, one at a time, from the table indexed by that sha1 column in one second.

To understand how this is possible, read about how a BTree works.

The number of bits in the key (160) has very little impact on what I said above. And 2^160 is simply the maximum number of distinct values for that key.

The only significant difference between looking it up on AWS and our personal laptop is the network latency.

If you are inserting a row but want to first check for the sha1 already being there, simply do the insert, then check for a "duplicate key" error. Or use INSERT IGNORE... Or use INSERT ... ON DUPLICATE KEY UPDATE... The Insert must (effectively) do a SELECT first to see if the will be a dup key -- essentially no extra effort.

Think back to how your grandparents used to look up a word in a book called a dictionary or telephone directory or atlas.

CodePudding user response:

I just want to calculate an imaginary estimated time

About 4 seconds.

Imagine B-Tree on a 160 bit key. If each Page is 8000 bytes, and the page pointer is also 160 bits that's 25 key values per page. 2^160 ~ 25^35, so there would be 35 levels in the B-Tree, so 35 IO's to get to the leaf level.

This would necessarily be a planet-sized distributed system, so allow 100ms per IO, and that's something under 4 seconds.

  • Related