immediate value encoding in ARM assembly-CodePudding

I'm learning about assembly language right now and I'm a bit confused about how the immediate values are encoded. Can someone explain why the following values are valid: 0xff00ff00, 0xffffffff, 0x007f8000? Also why are the values 0xff0000ff, 0x007f9000 invalid?

From my understanding, the 12 bit immediate is split into 4 upper bits of rotation and 8 lower bits of the constant. So I thought all of the values I listed above would be invalid because it would need more than 12 bits.

Some clarification on this topic would help so much, thanks!

CodePudding user response：

(This answer is for ARM32 mode, not Thumb2 or AArch64. Things are different there, and allowed immediates can depend on the instruction.)

You must be talking about the 12bit encoding. It actually is 4 8 bit encoding. 4 for the position, 8 for the pattern, so the rotate count has to be even.

any value from 0 to 255 is valid. 0x00 ~ 0xff pattern at the position 0
256 and any two power of N is valid. Since they are all 1bit pattern.
257 isn't valid since 0x101 requires a 9 bit pattern
258 isn't valid since its position is odd even though the pattern fits into 8bits. (129<<1)
260 is valid (65<<2)

And there are instruction such as mvn, cmn, etc that makes it hard to tell if a number is valid as an immediate value if your instruction is mov or cmp, or another one that has a version which does something to an immediate before using it.

PS: 2^4 = 16, and the register is 32bit. That's why the position has to be even.

CodePudding user response：

.thumb

ldr r0,=0xFF00FF00

0:  f04f 20ff   mov.w   r0, #4278255360 ; 0xff00ff00


.thumb
.cpu cortex-m0

ldr r0,=0xFF00FF00

00000000 <.text>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <.text 0x4>)
   2:   0000        .short  0x0000
   4:   ff00ff00    .word   0xff00ff00

Look at the ARM documentation it clearly documents how the immediate encodings work. And also basically what you cannot do. Various thumb2 extensions add more features as shown above (armv6-m vs armv7-m (or -a)).

As Jake points out the 32 bit arm instructions are basically 8 significant bits shifted by an even power of two (0,2,4,6).

ldr r0,=0x00000081
ldr r0,=0x00000101
ldr r0,=0x00000102
ldr r0,=0x00000204
ldr r0,=0x10000008
ldr r0,=0xEFFFFFF7
ldr r0,=0xFFFFF00F


00000000 <.text>:
   0:   e3a00081    mov r0, #129    ; 0x81
   4:   e59f0010    ldr r0, [pc, #16]   ; 1c <.text 0x1c>
   8:   e59f0010    ldr r0, [pc, #16]   ; 20 <.text 0x20>
   c:   e3a00f81    mov r0, #516    ; 0x204
  10:   e3a00281    mov r0, #268435464  ; 0x10000008
  14:   e3e00281    mvn r0, #268435464  ; 0x10000008
  18:   e3e00eff    mvn r0, #4080   ; 0xff0
  1c:   00000101    .word   0x00000101
  20:   00000102    .word   0x00000102

The arm encodings are easier to understand than the thumb encodings, but the arm docs have examples that make it easier.

Since you mentioned 0xFF00FF00 this means you are asking about armv7-a or armv7-m yes?