Home > OS >  Choosing a hash function for API call to determine when to update
Choosing a hash function for API call to determine when to update

Time:05-24

Context:
In the design of our application, for certain frequently-used APIs, the response will be big (~3-5MB). For example, an API call to get all the profiles of 1000 users.

Moreover, more often than not, the response will stay relatively the same - unchanged. We want to save the information in the front-end store (e.g., redux-store) as a JSON object, and when the FE calls the BE to retrieve the information, we will pass in a calculated checksum as a hash value of the JSON object - let say using MD5 function. BE will calculate the response in hash value using MD5 as well. And BE will only return the response if the hash values are different. Otherwise, it will return something like HTTP.status.OK

I wonder what the most appropriate hash function for this type of operation would be, what are the criteria to choose one? Seem like no global answer from what I searched. Indeed, it should be fast, but I feel like the time to calculate a hash value is ignorable compared to other database operations. Also, the chance of collisions is negligible as well.

CodePudding user response:

Any hash function with a low collision probability is workable for this application. You're not using the hash to secure your data against modification.

That being said, you should avoid MD5 for code-reviewer and boss reasons. It's no longer good for security, and some people don't like seeing it used in any new code.

SHA2-224 should be fine, for performance, security, and boss satisfaction.

  • Related