I have this RISC-V assembly program:
addi x2,x0,5
addi x3,x0,12
addi x7,x3,-9
or x4,x7,x2
and x5,x3,x4
add x5,x5,x4
beq x5,x7,0x48
I want to get the assembled instructions in hex. format to load them in an FPGA. I get those values by executing the following:
#!/bin/bash
# Given a source assembly file, assembly it and display
# the hex values for each instruction
SRC_FILE=$1
RV_AS=riscv64-unknown-elf-as
RV_OBJCOPY=riscv64-unknown-elf-objcopy
$RV_AS -o /tmp/gen_asm_instr.elf $SRC_FILE -march=rv32ima
$RV_OBJCOPY -O binary /tmp/gen_asm_instr.elf /tmp/gen_asm_instr.bin
xxd -e -c 4 /tmp/gen_asm_instr.bin | cut -d ' ' -f 2
If I comment out the last assembly instruction (beq
), everything works. I get the following result:
00500113
00c00193
ff718393
0023e233
0041f2b3
004282b3
Those are 6 instructions, everything fine. However, if I uncomment the last instruction, I get:
00500113
00c00193
ff718393
0023e233
0041f2b3
004282b3
00729463
0000006f
Those are 8 instructions. If I "dis-assemble" the above, I get:
Dis-assemble
# Create a file 'template.txt' with the above instructions:
00000000: 00500113 00c00193 ff718393 0023e233
00000010: 0041f2b3 004282b3 00729463 0000006f
# Use xxd and obdjump to recover the assembly instructions
xxd -r template.txt > a.out # generate a binary file
xxd -e a.out > a-big.txt # switch endianness
xxd -r ./a-big.txt > a2.out # generate a bin. file with the switched endianness
riscv64-unknown-elf-objdump -M no-aliases -M numeric -mabi=ilp32 -b binary -m riscv -D ./a2.out # dis-assemble it
Result:
./a2.out: file format binary
Disassembly of section .data:
0000000000000000 <.data>:
0: 00500113 addi x2,x0,5
4: 00c00193 addi x3,x0,12
8: ff718393 addi x7,x3,-9
c: 0023e233 or x4,x7,x2
10: 0041f2b3 and x5,x3,x4
14: 004282b3 add x5,x5,x4
18: 00729463 bne x5,x7,0x20
1c: 0000006f jal x0,0x1c
So the RISC-V assembler is transforming the beq
instruction in two: bne
and jal
.
Why this happens? How can I avoid it?
EDIT
I have tried with this online assembler:
https://riscvasm.lucasteske.dev/
and the same happens.
CodePudding user response:
The build system seems to do that when using a hard-coded numeric address for the branch target. Can't explain why it chooses to do that but I will note that that jal
has a much farther reach (20 bit immediate) than beq
(12 bit immediate). As both bxx
and jal
are PC-relative, neither supports absolute addressing. The assembler might not know where the code will be located, and so is giving you additional range to reach that absolute address.
If you use a label as branch target it won't do that when within reach.