Home > front end >  Getting a std::string or C string from a QString representing an arbitrary filename on Windows
Getting a std::string or C string from a QString representing an arbitrary filename on Windows

Time:02-19

I'm using QFileDialog::getOpenFileName() to have the user select a file, but I need the result to be a C string, since I have to pass it to something written in C which uses fopen(). I cannot change this.

The problem I'm finding is that, on Windows/MinGW, using toStdString() on the resulting QString doesn't work well with Unicode/non-ASCII filenames. Trying to open the file based on the std::string fails, because some character set conversion seems to be occurring. Sometimes using toLocal8Bit() to convert works, but sometimes it doesn't.

Consider the following (MinGW) program:

#include <cstdio>
#include <iostream>

#include <QApplication>
#include <QFileDialog>
#include <QFile>

int main(int argc, char **argv)
{
    QApplication app(argc, argv);
    auto filename = QFileDialog::getOpenFileName();
    QFile f(filename);

    std::cout << "fopen: " << (std::fopen(filename.toStdString().c_str(), "r") != nullptr) << std::endl;
    std::cout << "fopen (local8bit): " << (std::fopen(filename.toLocal8Bit().data(), "r") != nullptr) << std::endl;
    std::cout << "Qt can open: " << f.open(QIODevice::ReadOnly) << std::endl;
}
  • For a file called ☢.txt, toStdString() works, local8Bit() doesn't.
  • For a file called ä.txt, toStdString() doesn't work, local8Bit() does.
  • For a file called Ȁ.txt, neither works.

In all cases, though, QFile is able to open the file. I suppose it's probably using Unicode Windows functions while the C code is using fopen(), which, to my understanding is a so-called ANSI function on Windows. But is there any way to get a “bag of bytes”, so to speak, from a QString? I don't care about the encoding of the filename, I just want something that can be passed to fopen() to open the file.

I've found that using GetShortPathName to get a short filename from filename.toWCharArray() seems to work, but that's very cumbersome, and my understanding is that NTFS filesystems can be told not to support short names, so it's not a viable solution in general anyway.

CodePudding user response:

File paths in the non-unicode API of Windows are either parsed in the current ANSI (Microsoft codec) codepage, or in the OEM codepage (see also https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfope). ANSI is the default.

So your question translates to: How can I convert a UTF-8 or UTF-16 string to ANSI or OEM?

There's an answer for the ANSI conversion: How to convert from UTF-8 to ANSI using standard c

Anyhow, it's important to realize that not all UTF strings can be represented in these more narrow codecs...

CodePudding user response:

Where const char * is expected I am using myQstring.toUtf8().constData(). But be careful not to pass a pointer to a deleted temporary. So be careful of these situations:

const char *cstr = myQstring.toUtf8().constData(); // toUtf8() creates a temporary variable
someCStrFunction(cstr); // !!! NO !!! do NOT do anything with cstr here, now it already points to freed memory ...

Use this instead:

QByteArray bytes = myQstring.toUtf8();
someCStrFunction(bytes.constData()); // this is fine

Note that it is ensured that in this case the c-string is null-terminated. https://doc.qt.io/qt-5/qbytearray.html#constData

Where wide-char c-strings are expected (Windows API), I am using myQString.utf16(). Again, be careful about not using pointer to a deleted temporary. And this is also null-terminated https://doc.qt.io/qt-5/qstring.html#utf16

This is how I am using it in my code on Windows, macOS and Linux. Seems to work for me at all times.

  • Related