Home > front end >  Printing unicode character in MASM32 assembly
Printing unicode character in MASM32 assembly

Time:06-06

I am trying to print a unicode character in MASM32 assembly, but I can't make it work. Here is a reproducible example :

.DATA
output      db    "%x Hello",10,0
unicode     DWORD "∟", 0

.DATA?

.CODE
start:
        push offset unicode
        push offset output
        call crt_printf
        
        invoke  ExitProcess, 0

end start

Current output : 40300a Hello

Expected output : ∟ Hello

CodePudding user response:

The desired character seems to be └ Box Drawings Light Up and Right
which has encoding 0xE29494 in UTF-8 alias 0x1425 in UTF-16LE.
I don't know how did your texteditor encoded the source line unicode DWORD "∟", 0 but the (unreproducible for me) function crt_printf seems to not cope with it.

MS Windows works with UTF-16LE, you'll need WinAPI function WriteConsoleW and define the lpBuffer as

    unicode db 14h,25h
    output  dw " ","H","e","l","l","o",10,0
nNumberOfCharsToWrite EQU ($-unicode)/2  ; Number of 16bit characters.

Questions related to Microsoft Macro Assembler have a dedicated MASM Forum here.

Printing Unicode strings might be easier in other assemblers, for instance with macro StdOutput in €ASM

rchg PROGRAM Format=PE,Entry=start
       INCLUDE winapi.htm
[.data]       
Buffer DB 14h,25h
       DU " Hello",10,0
[.text]       
start: StdOutput Buffer,Console=yes,Unicode=yes
       WinAPI ExitProcess, 0
     ENDPROGRAM

The previous source compiles and works fine:

R:\>euroasm.exe rchg.asm
I0010 EuroAssembler version 20191104 started.
I0020 Current directory is "R:\".
I0180 Assembling source file "rchg.asm".
I0470 Assembling program "Rchg". 
I0510 Assembling program pass 1. 
I0510 Assembling program pass 2. 
I0530 Assembling program pass 3 - final.
I0660 32bit FLAT PE file "Rchg.exe" created, size=16732. 
I0650 Program "Rchg" assembled in 3 passes with errorlevel 0. 
I0750 Source "rchg" (1229 lines) assembled in 2 passes with errorlevel 0.
I0860 Listing file "rchg.asm.lst" created, size=1753.
I0980 Memory allocation 960 KB. 28249 statements assembled in 1 s.
I0990 EuroAssembler terminated with errorlevel 0.

R:\>rchg.exe
└ Hello

CodePudding user response:

You're using %x to print a number as hex digits, and the number you're passing is an address. So 0x40300a is the address of the unicode label in your .data section.

%s should probably work, if it and the output terminal support the same encoding that your editor and assembler used. It should just copy bytes from the address you pass, until reaching a 0, so it should Just Work for UTF-8. But not for UTF-16, if there's a 0 byte somewhere in there. %ls could work if supported, treating the arg as a wchar_t* string.

If you wanted to pass a word or dword as a wide-character for %lc, you'd push dword ptr [unicode]. Maybe. In ISO C99 and C , %lc takes an int arg, and prints it like it would a wchar_t[2] string (I think with the 2nd element being a terminating 0, if that's what cppreference means). But Microsoft has persistently declined to support standard C and C features, especially around printf, so who knows what crt_printf supports.

  • Related