Home > Software design >  Memory efficient computation of md5sum of a file in vlang
Memory efficient computation of md5sum of a file in vlang

Time:05-31

The following code read a file into bytes and computes the md5sum of the bytes array. It works but I would like to find a solution in V that need less RAM. Thanks for your comments !

import os
import crypto.md5

b := os.read_bytes("file.txt") or {panic(err)}

s := md5.sum(b).hex()

println(s)

I also tried without success :

import os
import crypto.md5
import io

mut f := os.open_file("file.txt", "r")?

mut h := md5.new()

io.cp(mut f, mut h)?

s := h.sum().hex()

println(s) // does not return the correct md5sum

CodePudding user response:

Alrighty. This is what you're looking for. It produces the same result as md5sum and is only slightly slower. block_size is inversely related to the amount of memory used and speed at which the checksum is computed. Decreasing block_size will lower the memory footprint, but takes longer to compute. Increasing block_size has the opposite effect. I tested on a 2GB manjaro disc image and can confirm the memory usage is very low.

Note: It seems this does perform noticeably slower without the -prod flag. The V compiler makes special optimizations in order to run faster for the production build.

import crypto.md5
import io
import os

fn main() {
    println(hash_file('manjaro.img')?)
}

const block_size = 64 * 65535

fn hash_file(path string) ?string {
    mut file := os.open(path)?
    defer {
        file.close()
    }
    mut buf := []u8{len: block_size}
    mut r := io.new_buffered_reader(reader: file)
    mut digest := md5.new()
    for {
        x := r.read(mut buf) or { break }
        digest.write(buf[..x])?
    }
    return digest.checksum().hex()
}

CodePudding user response:

To conclude what I've learned from the comments:

  • V is a programming language with typed arguments
  • md5.sum takes a byte array argument, and not something that is a sequence of bytes, e.g. read from a file as-you-go.
  • There's no alternative to md5.sum

So, you will have to implement MD5 yourself. Maybe the standard library is open source and you can build upon that! Or, you can just bind any of the existing (e.g. C) implementations of MD5 and feed in bytes as you read them, in chunks of 512 bits = 2⁶ Bytes.

EDIT: I don't know V, so it's hard for me to judge, but it would look Digest.write would be a method to consecutively push data through the MD5 calculation. Maybe that together with a while loop reading bytes from the file is the solution?

  • Related