Home > Net >  Using a FilePath when interfacing with a function that uses fopen on Windows
Using a FilePath when interfacing with a function that uses fopen on Windows

Time:06-22

I'm using the FFI to work with a portable C library. However, I'm running into issues when trying to use some of the library functions that use fopen internally, probably due to filename encoding.

GHC offers some mechanisms that can be used to control the encoding used with functions like pushCString. E.g., the following code should push a Haskell String value as nul-terminated C string using the encoding used by the file system:

import qualified GHC.Foreign as GHC
import qualified GHC.IO.Encoding as GHC

main =
  encoding <- GHC.getFileSystemEncoding
  GHC.pushCString encoding "my example"

This appears to work well on Unix-like systems, but not on Windows: My program cannot find files that contain umlauts.

The documentation of getFileSystemEncoding comes with the warning that hints at the problem.

On Windows, this encoding should not be used if possible because the use of code pages is deprecated: Strings should be retrieved via the "wide" W-family of UTF-16 APIs instead

But this gives no info on how to deal with a filepath that will be passed to a foreign function.

Minimal example

{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C (CString)
import Foreign.Ptr (Ptr, nullPtr)
import qualified GHC.Foreign as GHC
import qualified GHC.IO.Encoding as GHC

filename = "Ümlauts.txt"

main = do
  writeFile filename "content"  -- ensure file exists
  fsEncoding <- GHC.getFileSystemEncoding
  GHC.withCString fsEncoding filename $ \fscpath -> do
    handle <- greeting_fopen fscpath
    if handle == nullPtr
      then error $ "Could not open file "    filename
      else do
        putStrLn "file opened successfully"

-- Library function that uses fopen internally
foreign import ccall "greeting.c greeting_fopen"
  greeting_fopen :: CString -> IO (Ptr ())

where greeting.c contains

#include <stdio.h>
#include <stdlib.h>

FILE *greeting_fopen (const char *filename) {
  printf("Hello, now trying to open %s\n", filename);
  return fopen (filename, "r");
}

Running ghc --make greeting.c main.hs and executing the resulting binary succeeds on Linux, but fails on Windows due to the file not being found.

Is there a way to make this work?

CodePudding user response:

The Windows documentation for fopen states:

The fopen function opens the file that is specified by filename. By default, a narrow filename string is interpreted using the ANSI codepage (CP_ACP).

Therefore, that's the code page that must be used to encode the filename when it is passed to the library function. We can't pass CPACP (or CP_ACP) as the argument for System.IO.mkTextEncoding, as that function only supports numerical code pages like "CP1252".

However, CP_ACP is available as code page 0, so we can use

fsEncoding <- System.IO.mkTextEncoding "CP0"

Of course, this won't work with Linux, so we need some ugly CPP code like

#if defined(mingw32_HOST_OS)
  fsEncoding <- mkTextEncoding "CP0"
#else
  fsEncoding <- getFileSystemEncoding
#endif

The code may still fail if the filename cannot be represented in the ANSI codepage, but that seems to be an unfixable limitation.


The use of "CP0" is a bit of a hack; for completeness, here's some code that uses getACP from the Windows API to get the right codepage.

import Data.Word (Word32 (..))
import GHC.IO.Encoding.CodePage (codePageEncoding)

foreign import ccall unsafe "windows.h GetACP"
    getACP :: IO Word32

getFileSystemEncoding =
  codePageEncoding <$> getACP
  • Related