I want to calculate Sha1 of any given file in C using OpenSSL library.
I have read any article on the internet (including all from stackoverflow too) about doing this for almost 3 days.
Finally I get my program to work but the generated hash of any given file is not as it should be.
My code is someway similar to these found here and here but more easy to read and to use further in my program I write.
Also, I want to use C code not C code as they are written in the links above, second, they use:
SHA256_Init(&context);
SHA256_Update(&context, (unsigned char*)input, length);
SHA256_Final(md, &context);
which aren't available anymore in the new/current OpenSSL version (3.0 or so, I think).
So, I think this question will help many other readers that I observe meet the same problem(s) I do with the new OpenSSL version and can not use old code samples anymore.
This is my C code that is created to read huge files by chuncks without loading them into memory (hope this will help future readers of this post because it have many useful lines but it is not fully working as you will see):
bool hashFullFile(const std::string& FilePath, std::string &hashed, std::string &hash_type) {
bool success = false;
EVP_MD_CTX *context = EVP_MD_CTX_new();
//read file by chuncks:
const int BUFFER_SIZE = 1024;
std::vector<char> buffer (BUFFER_SIZE 1, 0);
// check if the file to read from exists and if so read the file in chunks
std::ifstream fin(FilePath, std::ifstream::binary | std::ifstream::in);
if (hash_type == "SHA1") {
if (context != NULL) {
if (EVP_DigestInit_ex(context, EVP_sha1(), NULL)) {
while (fin.good()){
fin.read(buffer.data(), BUFFER_SIZE);
std::streamsize s = ((fin) ? BUFFER_SIZE : fin.gcount());
buffer[s] = 0;
//convert vector of chars to string:
std::string str(buffer.data());
if (!EVP_DigestUpdate(context, str.c_str(), str.length())) {
fprintf(stderr, "Error while digesting file.\n");
return false;
}
}
unsigned char hash[EVP_MAX_MD_SIZE];
unsigned int lengthOfHash = 0;
if (EVP_DigestFinal_ex(context, hash, &lengthOfHash)) {
std::stringstream ss;
for (unsigned int i = 0; i < lengthOfHash; i) {
ss << std::hex << std::setw(2) << std::setfill('0') << (int) hash[i];
}
hashed = ss.str();
success = true;
}else{
fprintf(stderr, "Error while finalizing digest.\n");
return false;
}
}else{
fprintf(stderr, "Error while initializing digest context.\n");
return false;
}
EVP_MD_CTX_free(context);
}else{
fprintf(stderr, "Error while creating digest context.\n");
return false;
}
}
fin.close();
return success;
}
And I am using it like this into main function:
std::string myhash;
std::string myhash_type = "SHA1";
hashFullFile(R"(C:\Users\UserName\data.bin)", myhash, myhash_type);
cout<<myhash<<endl;
The problem is that for a given file it calculates hash:
e.g. 169ed28c9796a8065f96c98d205f21ddac11b14e as the hash output but the same file has the hash:
openssl dgst -sha1 data.bin
SHA1(data.bin)= 1927f720a858d0c3b53893695879ae2a7897eedb
generated by Openssl command line and also by any site from the internet.
I can't figure out what am I doing wrong since my code seems to be correct.
Please help.
Thank you very much in advance!
CodePudding user response:
You're missing the finishing calculation on your EVP API attempt. The use of an intermediate string is unnecessary as well. Finally, the function should return the digest as a vector of bytes. let the caller do with that what they want.
Examples using both the EVP API and a BIO chain are shown below.
#include <iostream>
#include <fstream>
#include <algorithm>
#include <array>
#include <vector>
#include <memory>
#include <openssl/evp.h>
#include <openssl/sha.h>
namespace
{
struct Delete
{
void operator()(BIO * p) const
{
BIO_free(p);
}
void operator()(EVP_MD_CTX *p) const
{
EVP_MD_CTX_free(p);
}
};
using BIO_ptr = std::unique_ptr<BIO, Delete>;
using EVP_MD_CTX_ptr = std::unique_ptr<EVP_MD_CTX, Delete>;
}
std::vector<uint8_t> hashFileEVP(const std::string &fname, std::string const &mdname = "sha1")
{
// will hold the resulting digest
std::vector<uint8_t> md;
// set this to however big you want the chunk size to be
static constexpr size_t BUFFER_SIZE = 1024;
std::array<char, BUFFER_SIZE> buff;
// get the digest algorithm by name
const EVP_MD *mthd = EVP_get_digestbyname(mdname.c_str());
if (mthd)
{
std::ifstream inp(fname, std::ios::in | std::ios::binary);
if (inp.is_open())
{
EVP_MD_CTX_ptr ctx{EVP_MD_CTX_new()};
EVP_DigestInit_ex(ctx.get(), mthd, nullptr);
while (inp.read(buff.data(), BUFFER_SIZE).gcount() > 0)
EVP_DigestUpdate(ctx.get(), buff.data(), inp.gcount());
// size output vector
unsigned int mdlen = EVP_MD_size(mthd);
md.resize(mdlen);
// general final digest
EVP_DigestFinal_ex(ctx.get(), md.data(), &mdlen);
}
}
return md;
}
std::vector<uint8_t> hashFileBIO(std::string const &fname, std::string const &mdname = "sha1")
{
// the fixed-size read buffer
static constexpr size_t BUFFER_SIZE = 1024;
// will hold the resulting digest
std::vector<uint8_t> md;
// select this however you want.
const EVP_MD *mthd = EVP_get_digestbyname(mdname.c_str());
if (mthd)
{
// open the file and a message digest BIO
BIO_ptr bio_f(BIO_new_file(fname.c_str(), "rb"));
BIO_ptr bio_md(BIO_new(BIO_f_md()));
BIO_set_md(bio_md.get(), mthd);
// chain the bios together. note this bio is NOT
// held together with a smart pointer; all the
// bios in the chain are.
BIO *bio = BIO_push(bio_md.get(), bio_f.get());
// read through file one buffer at a time.
std::array<char, BUFFER_SIZE> buff;
while (BIO_read(bio, buff.data(), buff.size()) > 0)
; // intentionally empty
// size output buffer
unsigned int mdlen = EVP_MD_size(mthd);
md.resize(mdlen);
// read final digest from md bio.
BIO_gets(bio_md.get(), (char *)md.data(), mdlen);
}
return md;
}
int main()
{
OpenSSL_add_all_digests();
// i have this on my rig. use whatever you want
// or get the name from argv or some such.
static const char fname[] = "dictionary.txt";
auto md1 = hashFileEVP(fname);
std::cout << "hashed with EVP API\n";
BIO_dump_fp(stdout, md1.data(), md1.size());
std::cout << "hashed with BIO chain\n";
auto md2 = hashFileBIO(fname);
BIO_dump_fp(stdout, md2.data(), md2.size());
}
Output
hashed with EVP API
0000 - 0a 97 d6 63 ad a2 e0 39-fd 90 48 46 ab c5 36 12 ...c...9..HF..6.
0010 - 91 bd 2d 8e ..-.
hashed with BIO chain
0000 - 0a 97 d6 63 ad a2 e0 39-fd 90 48 46 ab c5 36 12 ...c...9..HF..6.
0010 - 91 bd 2d 8e ..-.
Output from openssl command line
craig@rogue1 % openssl dgst -sha1 dictionary.txt
SHA1(dictionary.txt)= 0a97d663ada2e039fd904846abc5361291bd2d8e
Note the digests are the same in all three cases.