I'm using QFileDialog::getOpenFileName()
to have the user select a file, but I need the result to be a C string, since I have to pass it to something written in C which uses fopen()
. I cannot change this.
The problem I'm finding is that, on Windows/MinGW, using toStdString()
on the resulting QString
doesn't work well with Unicode/non-ASCII filenames. Trying to open the file based on the std::string
fails, because some character set conversion seems to be occurring. Sometimes using toLocal8Bit()
to convert works, but sometimes it doesn't.
Consider the following (MinGW) program:
#include <cstdio>
#include <iostream>
#include <QApplication>
#include <QFileDialog>
#include <QFile>
int main(int argc, char **argv)
{
QApplication app(argc, argv);
auto filename = QFileDialog::getOpenFileName();
QFile f(filename);
std::cout << "fopen: " << (std::fopen(filename.toStdString().c_str(), "r") != nullptr) << std::endl;
std::cout << "fopen (local8bit): " << (std::fopen(filename.toLocal8Bit().data(), "r") != nullptr) << std::endl;
std::cout << "Qt can open: " << f.open(QIODevice::ReadOnly) << std::endl;
}
- For a file called
☢.txt
,toStdString()
works,local8Bit()
doesn't. - For a file called
ä.txt
,toStdString()
doesn't work,local8Bit()
does. - For a file called
Ȁ.txt
, neither works.
In all cases, though, QFile
is able to open the file. I suppose it's probably using Unicode Windows functions while the C code is using fopen()
, which, to my understanding is a so-called ANSI function on Windows. But is there any way to get a “bag of bytes”, so to speak, from a QString
? I don't care about the encoding of the filename, I just want something that can be passed to fopen()
to open the file.
I've found that using GetShortPathName
to get a short filename from filename.toWCharArray()
seems to work, but that's very cumbersome, and my understanding is that NTFS filesystems can be told not to support short names, so it's not a viable solution in general anyway.
CodePudding user response:
File paths in the non-unicode API of Windows are either parsed in the current ANSI (Microsoft codec) codepage, or in the OEM codepage (see also https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfope). ANSI is the default.
So your question translates to: How can I convert a UTF-8 or UTF-16 string to ANSI or OEM?
There's an answer for the ANSI conversion: How to convert from UTF-8 to ANSI using standard c
Anyhow, it's important to realize that not all UTF strings can be represented in these more narrow codecs...
CodePudding user response:
Where const char *
is expected I am using myQstring.toUtf8().constData()
. But be careful not to pass a pointer to a deleted temporary. So be careful of these situations:
const char *cstr = myQstring.toUtf8().constData(); // toUtf8() creates a temporary variable
someCStrFunction(cstr); // !!! NO !!! do NOT do anything with cstr here, now it already points to freed memory ...
Use this instead:
QByteArray bytes = myQstring.toUtf8();
someCStrFunction(bytes.constData()); // this is fine
Note that it is ensured that in this case the c-string is null-terminated. https://doc.qt.io/qt-5/qbytearray.html#constData
Where wide-char c-strings are expected (Windows API), I am using myQString.utf16()
. Again, be careful about not using pointer to a deleted temporary. And this is also null-terminated https://doc.qt.io/qt-5/qstring.html#utf16
This is how I am using it in my code on Windows, macOS and Linux. Seems to work for me at all times.