I was wondering why you would use the and
instruction instead of the sub
instruction when converting lowercase ASCII characters to uppercase ones.
mov dx, 'a'
sub dx, 32
vs
mov dx, 'a'
and dx, 11011111b
CodePudding user response:
There's no performance or correctness difference if you already know the input is a lower-case alphabetic character. and
has the advantage when you know it's alphabetic but it might already be upper-case, since it leaves upper-case letters unmodified. (Or as part of detecting alphabetic and normalizing to one case, either with and
with ~0x20
or or
with 0x20
, as in What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?)
If the next instruction is a jcc
like jnz
, sub
and and
are equally able to macro-fuse with it into a single uop on Intel Sandybridge-family CPUs, so no advantage there.
If using it in a loop over a zero-terminated C string, you might be doing something like movzx edx, byte [rdi]
/ and edx, ~0x20
/ jnz .loop
at the bottom of a loop, since all alphabetic characters have non-zero bits other than the lower-case bit. (0x20
is ASCII
space).
Using sub
in that case lets you exit a loop on any character less than space, i.e. control characters, tabs, or newline. sub edx, 0x20
/ ja .loop
, or jae .loop
to keep looping even on a space (but still not tab or newline).
CodePudding user response:
Either one is acceptable, it's just a matter of preference. I like to use and
myself. Shouldn't matter as long as you've checked to make sure your character is between 'a'
and 'z'
first.