I am currently developing an open-source Text-based storage utility called WaterBase. The aim is to facilitate easy saving and access of persistent key-value data, like we have in Android SharedPreferences.
The data storage scheme is like this:
type:key:value
The problem I am facing is that if someone uses : as a character in their key or value, the code breaks as it counts : as separator.
How do I overcome this behavior? I don't want to restrict the use of separators in user data. I looked about encoding but couldn't find any working code without external libraries. You can have a look in the .h file here.
A mechanism that can be easily implemented in all languages instead of just C would be better so as to diversify the use case.
CodePudding user response:
If you indeed want no special characters in the output string, you need to store the information about the string length beforehand. You could use an approach similar to name mangling: store the length of the next entry as integer followed by a seperator followed by the actual content:
Example
A string is stored as
<string length(decimal)> '_' <string content>
struct Entry
{
std::string type;
std::string key;
std::string value;
};
void WriteMangled(std::ostream& s, std::string const& str)
{
s << str.length() << '_' << str;
}
void ParseMangled(std::istream& s, std::string& str)
{
size_t size;
char c;
if ((s >> size) && (s >> c))
{
assert(c == '_');
str.resize(size, '\0');
s.read(str.data(), size);
}
}
std::ostream& operator<<(std::ostream& s, Entry const& entry)
{
WriteMangled(s, entry.type);
WriteMangled(s, entry.key);
WriteMangled(s, entry.value);
return s;
}
std::istream& operator>>(std::istream& s, Entry& entry)
{
ParseMangled(s, entry.type);
ParseMangled(s, entry.key);
ParseMangled(s, entry.value);
return s;
}
int main() {
std::ostringstream oss;
oss << Entry{ "_Td$a", "8X0_8", "foo bar baz"};
std::string str = std::move(oss).str();
std::cout << str << '\n';
std::istringstream iss(std::move(str));
Entry e;
iss >> e;
std::cout << e.type << '\n' << e.key << '\n' << e.value << '\n';
}
Adding an escape char could be simpler though, e.g. using the backslash char as character simply marking the next char as a char that is not a special character, like a seperator. The drawback is that you have to replace backslashes in the original strings with double backslashes when writing the output.
constexpr char EscapeChar = '\\';
constexpr char SeparatorChar = ':';
bool ReadEscapedString(std::istream& s, std::string& str)
{
bool escaped = false;
char c;
while (s >> c)
{
switch (c)
{
case EscapeChar:
if (!(s >> c))
{
return false; // could not read escaped char
}
break;
case SeparatorChar:
return true;
default:
break;
}
str.push_back(c);
}
return true;
}
std::istream& operator>>(std::istream& s, Entry& entry)
{
ReadEscapedString(s, entry.type)
&& ReadEscapedString(s, entry.key)
&& ReadEscapedString(s, entry.value);
return s;
}
int main() {
std::istringstream iss(R"(foo\:bar:\:baz\:\:a:x)"); // Note: Raw string literal for easier readability, see https://en.cppreference.com/w/cpp/language/string_literal
Entry e;
iss >> e;
std::cout << e.type << '\n' << e.key << '\n' << e.value << '\n';
}