Home > Software engineering >  is it possible calculate md5 of numpy image before to save?
is it possible calculate md5 of numpy image before to save?

Time:05-31

I am trying to save files with their MD5 as filename, in order to do this I am generating images in a Numpy Array, sometimes can be the same images, so I want to calculate MD5 in order to overwrite existing images or avoid saving.

The problem is that the hash that I get from NumPy array is not the same as the image saved finally, to do this I am using the following code:

hashlib.md5(array.astype("uint8")).hexdigest()

Is possible to calculate md5 hash from NumPy array, or do I need to save it with a random name and rename it after?

Thanks

CodePudding user response:

Following the comment and based upon the assumption that you are saving a numpy array, and not an image file, you could just do:

hash = hashlib.md5(array.tobytes()).digest()
np.save(hash, array)

Highly INADVISABLE what follows!

If you instead have to save the image, you should, in order:

  1. Save the image (.png, for example)
  2. Digest the file content with hashlib
  3. Delete existing image, if any
  4. Rename your new image

In code:

import hashlib
import os
from matplotlib.image import imsave
import binascii
imsave('myimage.jpg', image_array)
with open('myimage.jpg','rb') as f:
    ba = f.read()
_hash = hashlib.md5(ba).digest()
new_filename = binascii.hexlify(_hash).decode() '.jpg'
if os.path.exists(new_filename):
    os.remove(new_filename)
os.rename('myimage.jpg',new_filename)

Please, avoid doing so, as @Mark commented below, here replicated:

You are calculating the md5 digest of a JPEG-compressed file so you will likely not detect if it corresponds to another identical Numpy array if 1) the JPEG is wriiten by a different library, or 2) different version of the same library, or 3) with different quality settings or 4) on a different date if the date is embedded in the metadata, or 5) to a different image format such as PNG

  • Related