Home > Back-end >  How to encode a string in C so as to not collide with seperator?
How to encode a string in C so as to not collide with seperator?

Time:07-31

I am currently developing an open-source Text-based storage utility called WaterBase. The aim is to facilitate easy saving and access of persistent key-value data, like we have in Android SharedPreferences.

The data storage scheme is like this:

type:key:value

The problem I am facing is that if someone uses : as a character in their key or value, the code breaks as it counts : as separator.

How do I overcome this behavior? I don't want to restrict the use of separators in user data. I looked about encoding but couldn't find any working code without external libraries. You can have a look in the .h file here.

A mechanism that can be easily implemented in all languages instead of just C would be better so as to diversify the use case.

CodePudding user response:

If you indeed want no special characters in the output string, you need to store the information about the string length beforehand. You could use an approach similar to name mangling: store the length of the next entry as integer followed by a seperator followed by the actual content:

Example

A string is stored as

<string length(decimal)> '_' <string content>
struct Entry
{
    std::string type;
    std::string key;
    std::string value;
};

void WriteMangled(std::ostream& s, std::string const& str)
{
    s << str.length() << '_' << str;
}

void ParseMangled(std::istream& s, std::string& str)
{
    size_t size;
    char c;
    if ((s >> size) && (s >> c))
    {
        assert(c == '_');
        str.resize(size, '\0');
        s.read(str.data(), size);
    }
}

std::ostream& operator<<(std::ostream& s, Entry const& entry)
{
    WriteMangled(s, entry.type);
    WriteMangled(s, entry.key);
    WriteMangled(s, entry.value);
    return s;
}

std::istream& operator>>(std::istream& s, Entry& entry)
{
    ParseMangled(s, entry.type);
    ParseMangled(s, entry.key);
    ParseMangled(s, entry.value);
    return s;
}

int main() {
    std::ostringstream oss;
    oss << Entry{ "_Td$a", "8X0_8", "foo bar baz"};

    std::string str = std::move(oss).str();
    std::cout << str << '\n';
    
    std::istringstream iss(std::move(str));
    Entry e;
    iss >> e;

    std::cout << e.type << '\n' << e.key << '\n' << e.value << '\n';
}

Adding an escape char could be simpler though, e.g. using the backslash char as character simply marking the next char as a char that is not a special character, like a seperator. The drawback is that you have to replace backslashes in the original strings with double backslashes when writing the output.

constexpr char EscapeChar = '\\';
constexpr char SeparatorChar = ':';

bool ReadEscapedString(std::istream& s, std::string& str)
{
    bool escaped = false;
    char c;
    while (s >> c)
    {
        switch (c)
        {
        case EscapeChar:
            if (!(s >> c))
            {
                return false; // could not read escaped char
            }
            break;
        case SeparatorChar:
            return true;
        default:
            break;
        }
        str.push_back(c);
    }
    return true;
}

std::istream& operator>>(std::istream& s, Entry& entry)
{
    ReadEscapedString(s, entry.type)
        && ReadEscapedString(s, entry.key)
        && ReadEscapedString(s, entry.value);
    return s;
}

int main() {
    std::istringstream iss(R"(foo\:bar:\:baz\:\:a:x)"); // Note: Raw string literal for easier readability, see https://en.cppreference.com/w/cpp/language/string_literal
    Entry e;
    iss >> e;

    std::cout << e.type << '\n' << e.key << '\n' << e.value << '\n';
}
  • Related