dosseg
.model small
.stack 100h
.data
array db -1, -2, -3, -4, 1,2, 3, -5
.code
main PROC
mov ax, @data
mov ds, ax
xor ax, ax
xor dx, dx ; reset dx
lea si, array
mov cx, 8
back:
mov bl, [si]
cmp al, bl
jc continue ; carry will be generated if number in bl is positive
inc dx
continue:
inc si
clc
loop back
mov ah, 4ch
int 21h
main ENDP
end main
I wrote the above program to find the number of negative integers in an array.
Debugging showed that when SI is pointing at -1, the carry flag becomes 1 but it should not as the value at that instant in BL is FFh (negative) and in AL is 00h, so subtracting negative number from 0 should not generate a carry. What am I doing wrong?
Edit: I replaced the erroneous part with :
test bl, bl
jns continue
and now it works as expected but I still don't know why the cmp
method did not work.
CodePudding user response:
When you compare al=0
with bl
, carry flag (alias Below flag) will be set for any value in bl
except for bl=0
, because 0 is below any unsigned number in the range 0x01..0xFF.
Your array contains 8bit signed integer numbers. When we compare signed numbers, instead of adverbs below|above we use lower|greater which take into account Signum flag and Overflow flag.
Instead of jc continue ; carry will be generated if number in bl is positive
use
jle continue ; Jump if al=bl or if SF<>OF
.
Better readable solution is to replace cmp al,bl
with
test bl,bl
jns continue ; Skip incrementing dx when signum of bl is not set.
See also Jcc. You might output the result from DX using the returned errorlevel, just mov al,dl
before int 21h
.
CodePudding user response:
It's only interesting to get your condition into CF if you're going to take advantage of that
by using adc dx, 0
or sbb dx, -1
to conditionally increment DX (when CF is 1 or 0, respectively.)
The sbb
version is like dx -= -1 CF
so CF either cancels out the -1, or you subtract -1, i.e. add 1.
One way to get CF set according to the sign bit of a byte is simply to shift it out, e.g. shl bl, 1
, if you don't mind destroying the value in BL. Equivalently, add bl,bl
is also a 2-byte instruction but can run on more execution units on modern CPUs. (They both set FLAGS the same way, including CF).
It's not possible with a compare against zero. 0 - x
always has a borrow (CF=1) for any non-zero x
, and x - 0
never has carry-out.
Without modifying the register value, it is possible with cmp
, though: 0x7f - x
has unsigned wrapping (i.e. borrow output that sets CF) for x>=0x80 unsigned. i.e. for values with their MSB set.
xor dx, dx ; count = 0
mov si, OFFSET array ; LEA takes more bytes than mov-immediate. Never use LEA without a register, except for x86-64 RIP-relative
;;; The interesting part
mov al, 0x7f ; outside the loop
back: ; do {
cmp al, [si] ; CF = 0x7F <(unsigned)[SI]. i.e. MSB set in [si]
adc dx, 0 ; count negative values
;;; then the rest of the loop
inc si
cmp si, OFFSET array 8 ; the LOOP instruction isn't fast on most modern CPUs, and we're hard-coding the array length anyway. Or just put a label at the end of it and use that.
jne back ; }while(p != endp)
You don't need clc
in this or your version. CF isn't "sticky"; anything that updates its value sets it to 0 or 1 regardless of the old value. And it's not an input for cmp
.
We can't set CF=1 for bl < 0
(aka bl >= 0x80U
) with cmp bl, constant
, unfortunately. It only works the way you're doing it, setting another register to compare against. (cmp reg, 123
exists, cmp 123,reg
doesn't; most 2-operand instructions modify their destination and wouldn't make sense with an immediate destination, so it would be a special case to have yet another opcode for cmp
in the other direction.)
But you can do cmp bl, 0x80
to clear CF when bl < 0x80
, i.e. when its sign bit isn't set.
cmp byte ptr [si], 0x80 ; CF = [si] < (unsigned)0x80, i.e. non-negative
sbb dx, -1 ; count when CF=0, negative values
Loading the value into a register with mov bl, [si]
can be helpful for debugging, making it show up in your debugger's window of registers instead of having to examine memory. But that's not necessary; cmp
works with reg or memory operands (or an immediate), saving an instruction.
As a further optimization for code-size inside the loop, scasb
is equivalent to cmp al, es:[di]
/ inc di
(but the inc
part doesn't set FLAGS.) And it's actually dec di
if DF is set, so you'd want cld
somewhere in your program before a loop using "string" instructions to make sure they go in the forward direction.
Using scasb
means you need to use AL for that. Without scasb
, you could count into AL inside the loop, where it could be the exit status for your DOS call. (Perhaps that's why you were trying to use AL=0, if you wanted to exit(0) instead of returning a value.)
scasb
isn't particularly fast on modern CPUs, but it is on real 8086; so is the loop
instruction, because they're both compact code-size. loop
is a special-case optimization for dec cx
/jnz
(but also without affecting FLAGS).
If you just want to branch, use signed or signed-compare conditions
test reg,reg
/ jns
(not sign-bit-set) or jnl
(not less-than) are equivalent in this case. (Since test
with itself is equivalent to cmp
against zero, always clearing OF and CF). That uses the FLAGS and conditions for their normal semantic meaning, i.e. doing a normal signed compare. (test reg,reg
is a well-known optimization for cmp reg,0
)
SSE2 popcnt to check 8 bytes at once
The extra fun way, since you have exactly 8 bytes, uses SSE2 and popcnt
. (Yes this can work in 16-bit real mode, unlike AVX. In a bootloader and maybe DOS you'd have to manually enable the control-register bits that make SSE instructions not fault. Of course it only works on CPUs with popcnt
, like Nehalem and later from 2008 or so, otherwise use pcmpgtb
/ psadbw
/ movq
for just SSE2, or SSE1 using MMX registers.)
movq xmm0, qword ptr [array] ; load 8 bytes (zero-extending to a 16-byte XMM reg)
pmovmskb ax, xmm0 ; pack the sign bit of each byte into an integer reg
popcnt ax, ax ; count set bits = sign bits of the bytes
Would also work easily for 4 or 16 byte arrays, or for other compile-time-constant sizes, do 2 loads and shift out overlapping bytes.
For other element sizes, there's movmskps
(dword) and movmskpd
(qword)
With a larger array, you'd want to start accumulating counts in vector regs, like pcmpgtb
to compare for 0 > x
/ psubb xmm1, xmm0
to do total -= (0 or -1)
, up to 255 iterations of 16 bytes. Then accumulate with psadbw
against zero. Same problem as How to count character occurrences using SIMD but replacing pcmpeqb
with pcmpgtb
.