I'm messing around with (a bit old) SimIt-ARM 3.0. The SimIt-ARM 3.0 has issgen (ISS generator) based on (ARMv5) ISA defined using internal language defined using Lex specification & Yacc grammar. In particular it has a symbol_table
class based on STL hash_map
(which is not part of the C Standard Library, and which is now deprecated).
The problem: the hash_map
behaves unexpectedly. Here is a demo using the original test. We see that output differs between -O0
and -O1
.
Can someone help me to figure out what it the root cause?
UPD: here is the MRE:
#ifdef __GNUC__
#if __GNUC__ < 3
#include <hash_map.h>
namespace Sgi { using ::hash_map; using ::hash; }; // inherit globals
#else
#include <ext/hash_map>
#if __GNUC_MINOR__ == 0 && __GNU_C__ == 3
namespace Sgi = std; // GCC 3.0
#else
namespace Sgi = ::__gnu_cxx; // GCC 3.1 and later
#endif
#endif
#endif
#include <string>
#include <iostream>
#include <vector>
#include <cstring>
struct strEql
{
bool operator()(const char* sz1, const char* sz2)
{
return strcmp(sz1,sz2) == 0;
}
};
typedef Sgi::hash_map<const char *, unsigned int,
Sgi::hash<char *>, strEql> hash_map;
hash_map hasher;
unsigned int idx = 1;
void insert(const std::string& key)
{
hash_map::iterator it = hasher.find(key.c_str());
if (it==hasher.end())
{
hasher[key.c_str()] = idx ;
}
}
void print_hasher(void)
{
for(hash_map::iterator it = hasher.begin(); it != hasher.end(); it )
{
std::cout << "fisrt: " << it->first << ", second: " << it->second << std::endl;
}
}
int main(void)
{
insert("xxx");
insert("yyy");
print_hasher();
}
Invocations and output:
$ g mre.cpp -Wno-deprecated -O0
fisrt: xxx, second: 1
fisrt: yyy, second: 2
$ g mre.cpp -Wno-deprecated -O1
fisrt: xxx, second: 1
CodePudding user response:
In insert
you are creating a temporary std::string
key
, the pointer returned from key.c_str()
is therefore only valid until the end of the function.
Even if it was still valid then multiple calls to insert
with the same string wouldn't necessarily produce the same address, equally calls with different strings might produce the same address so your hash table will be completely broken.
If you really want to hash based on the addresses you need to make sure you are only using string literals which will always have the same address (though I don't think there is any guarantee that two literals with the same value will have the same address) and will be valid for the duration of your program:
void insert(const char* key)
{
hash_map::iterator it = hasher.find(key);
if (it==hasher.end())
{
hasher[key] = idx ;
}
}
https://godbolt.org/z/6r18T16P9
You still need to be careful as its easily possible to pass a non-literal to insert
and you'll be back to your original problem:
insert(std::string("yyy").c_str());