I am trying to print a unicode character in MASM32 assembly, but I can't make it work. Here is a reproducible example :
.DATA
output db "%x Hello",10,0
unicode DWORD "∟", 0
.DATA?
.CODE
start:
push offset unicode
push offset output
call crt_printf
invoke ExitProcess, 0
end start
Current output :
40300a Hello
Expected output :
∟ Hello
CodePudding user response:
The desired character seems to be └ Box Drawings Light Up and Right
which has encoding 0xE29494
in UTF-8 alias 0x1425
in UTF-16LE.
I don't know how did your texteditor encoded the source line unicode DWORD "∟", 0
but the (unreproducible for me) function crt_printf
seems to not cope with it.
MS Windows works with UTF-16LE, you'll need WinAPI function WriteConsoleW and define the lpBuffer as
unicode db 14h,25h
output dw " ","H","e","l","l","o",10,0
nNumberOfCharsToWrite EQU ($-unicode)/2 ; Number of 16bit characters.
Questions related to Microsoft Macro Assembler have a dedicated MASM Forum here.
Printing Unicode strings might be easier in other assemblers, for instance with macro StdOutput in €ASM
rchg PROGRAM Format=PE,Entry=start
INCLUDE winapi.htm
[.data]
Buffer DB 14h,25h
DU " Hello",10,0
[.text]
start: StdOutput Buffer,Console=yes,Unicode=yes
WinAPI ExitProcess, 0
ENDPROGRAM
The previous source compiles and works fine:
R:\>euroasm.exe rchg.asm
I0010 EuroAssembler version 20191104 started.
I0020 Current directory is "R:\".
I0180 Assembling source file "rchg.asm".
I0470 Assembling program "Rchg".
I0510 Assembling program pass 1.
I0510 Assembling program pass 2.
I0530 Assembling program pass 3 - final.
I0660 32bit FLAT PE file "Rchg.exe" created, size=16732.
I0650 Program "Rchg" assembled in 3 passes with errorlevel 0.
I0750 Source "rchg" (1229 lines) assembled in 2 passes with errorlevel 0.
I0860 Listing file "rchg.asm.lst" created, size=1753.
I0980 Memory allocation 960 KB. 28249 statements assembled in 1 s.
I0990 EuroAssembler terminated with errorlevel 0.
R:\>rchg.exe
└ Hello
CodePudding user response:
You're using %x
to print a number as hex digits, and the number you're passing is an address. So 0x40300a
is the address of the unicode
label in your .data
section.
%s
should probably work, if it and the output terminal support the same encoding that your editor and assembler used. It should just copy bytes from the address you pass, until reaching a 0
, so it should Just Work for UTF-8. But not for UTF-16, if there's a 0
byte somewhere in there. %ls
could work if supported, treating the arg as a wchar_t*
string.
If you wanted to pass a word or dword as a wide-character for %lc
, you'd push dword ptr [unicode]
. Maybe. In ISO C99 and C , %lc
takes an int
arg, and prints it like it would a wchar_t[2]
string (I think with the 2nd element being a terminating 0, if that's what cppreference means). But Microsoft has persistently declined to support standard C and C features, especially around printf, so who knows what crt_printf
supports.