Home > other >  How can I convert a std::string to UTF-8?
How can I convert a std::string to UTF-8?

Time:11-23

I need to put a stringstream as a value of a JSON (using rapidjson library), but std::stringstream::str is not working because it is not returning UTF-8 characters. How can I do that?

Example: d["key"].SetString(tmp_stream.str());

CodePudding user response:

rapidjson::Value::SetString accepts a pointer and a length. So you have to call it this way:

std::string stream_data = tmp_stream.str();
d["key"].SetString(tmp_stream.data(), tmp_string.size());

As others have mentioned in the comments, std::string is a container of char values with no encoding specified. It can contain UTF-8 encoded bytes or any other encoding.

I tested putting invalid UTF-8 data in an std::string and calling SetString. RapidJSON accepted the data and simply replaced the invalid characters with "?". If that's what you're seeing, then you need to:

  1. Determine what encoding your string has
  2. Re-encode the string as UTF-8

If your string is ASCII, then SetString will work fine as ASCII and UTF-8 are compatible.

If your string is UTF-16 or UTF-32 encoded, there are several lightweight portable libraries to do this like utfcpp. C 11 had an API for this, but it was poorly supported and now deprecated as of C 17.

If your string encoded with a more archaic encoding like Windows-1252, then you might need to use either an OS API like MultiByteToWideChar on Windows, or use a heavyweight Unicode library like LibICU to convert the data to a more standard encoding.

  • Related