I'm using the FFI to work with a portable C library. However, I'm running into issues when trying to use some of the library functions that use fopen
internally, probably due to filename encoding.
GHC offers some mechanisms that can be used to control the encoding used with functions like pushCString
. E.g., the following code should push a Haskell String
value as nul-terminated C string using the encoding used by the file system:
import qualified GHC.Foreign as GHC
import qualified GHC.IO.Encoding as GHC
main =
encoding <- GHC.getFileSystemEncoding
GHC.pushCString encoding "my example"
This appears to work well on Unix-like systems, but not on Windows: My program cannot find files that contain umlauts.
The documentation of getFileSystemEncoding
comes with the warning that hints at the problem.
On Windows, this encoding should not be used if possible because the use of code pages is deprecated: Strings should be retrieved via the "wide" W-family of UTF-16 APIs instead
But this gives no info on how to deal with a filepath that will be passed to a foreign function.
Minimal example
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C (CString)
import Foreign.Ptr (Ptr, nullPtr)
import qualified GHC.Foreign as GHC
import qualified GHC.IO.Encoding as GHC
filename = "Ümlauts.txt"
main = do
writeFile filename "content" -- ensure file exists
fsEncoding <- GHC.getFileSystemEncoding
GHC.withCString fsEncoding filename $ \fscpath -> do
handle <- greeting_fopen fscpath
if handle == nullPtr
then error $ "Could not open file " filename
else do
putStrLn "file opened successfully"
-- Library function that uses fopen internally
foreign import ccall "greeting.c greeting_fopen"
greeting_fopen :: CString -> IO (Ptr ())
where greeting.c
contains
#include <stdio.h>
#include <stdlib.h>
FILE *greeting_fopen (const char *filename) {
printf("Hello, now trying to open %s\n", filename);
return fopen (filename, "r");
}
Running ghc --make greeting.c main.hs
and executing the resulting binary succeeds on Linux, but fails on Windows due to the file not being found.
Is there a way to make this work?
CodePudding user response:
The Windows documentation for fopen states:
The
fopen
function opens the file that is specified byfilename
. By default, a narrow filename string is interpreted using the ANSI codepage (CP_ACP
).
Therefore, that's the code page that must be used to encode the filename when it is passed to the library function. We can't pass CPACP
(or CP_ACP
) as the argument for System.IO.mkTextEncoding
, as that function only supports numerical code pages like "CP1252".
However, CP_ACP is available as code page 0
, so we can use
fsEncoding <- System.IO.mkTextEncoding "CP0"
Of course, this won't work with Linux, so we need some ugly CPP code like
#if defined(mingw32_HOST_OS)
fsEncoding <- mkTextEncoding "CP0"
#else
fsEncoding <- getFileSystemEncoding
#endif
The code may still fail if the filename cannot be represented in the ANSI codepage, but that seems to be an unfixable limitation.
The use of "CP0" is a bit of a hack; for completeness, here's some code that uses getACP
from the Windows API to get the right codepage.
import Data.Word (Word32 (..))
import GHC.IO.Encoding.CodePage (codePageEncoding)
foreign import ccall unsafe "windows.h GetACP"
getACP :: IO Word32
getFileSystemEncoding =
codePageEncoding <$> getACP