I am designing my own RISC-V CPU and have been able to implement a few instruction codes.
I have installed the RV32I version of the GCC compiler and so I now have the assembler riscv32-unknown-elf-as
available.
I'm trying to assemble a program with just one instruction:
# simple.asm
add x5,x6,x7
I compile this with the assembler and then run objdump with this command:
riscv32-unknown-elf-as simple.asm -o simple
riscv32-unknown-elf-objdump -D simple
This prints out the following:
new: file format elf32-littleriscv
Disassembly of section .text:
00000000 <.text>:
0: 007302b3 add t0,t1,t2
Disassembly of section .riscv.attributes:
00000000 <.riscv.attributes>:
0: 2d41 jal 0x690
2: 0000 unimp
4: 7200 flw fs0,32(a2)
6: 7369 lui t1,0xffffa
8: 01007663 bgeu zero,a6,0x14
c: 00000023 sb zero,0(zero) # 0x0
10: 7205 lui tp,0xfffe1
12: 3376 fld ft6,376(sp)
14: 6932 flw fs2,12(sp)
16: 7032 flw ft0,44(sp)
18: 5f30 lw a2,120(a4)
1a: 326d jal 0xfffff9c4
1c: 3070 fld fa2,224(s0)
1e: 615f 7032 5f30 0x5f307032615f
24: 3266 fld ft4,120(sp)
26: 3070 fld fa2,224(s0)
28: 645f 7032 0030 0x307032645f
My questions are:
- What is going on here? I thought I'd have a simple single line of hex, but there's a lot more going on.
- How do I instruct my processor to start reading the instructions at a certain memory address? It looks like objdump also doesn't know where the instructions will begin.
Just to be clear, I'm treating my processor as bare metal at this point. I am imagining I will hardcode in the processor that the instructions start at memory address X and data is available at memory address Y and stack is available at memory address Z. Is this correct? Or is this the wrong approach?
EDIT: @PeterCordes answer below set me on the right path. I finally figured out how to generate a raw memory dump file that I can use.
The steps are as follows:
Modified the assembly file to have a
.text
and.data
section and a_start
label. Mysimple.asm
file now looks as follows:.globl _start .text _start: add x5,x6,x7 .data L1: .word 27
Assemble the
.asm
to a.o
file using the following command:riscv32-unknown-elf-as simple.asm -o simple.o
Create a linker script for the specific processor. I followed this amazing video which walks through the process on creating a linker script from scratch. For now, I just need
.text
and.data
sections. So my linker script (mycpu.ld
) is as shown below:OUTPUT_FORMAT("elf32-littleriscv", "elf32-littleriscv", "elf32-littleriscv") ENTRY(_start) MEMORY { DATA (rwx) : ORIGIN = 0x0, LENGTH = 0x80 INST (rx) : ORIGIN = 0x80, LENGTH = 0x80 } SECTIONS { .data : { *(.data) }> DATA .text : { *(.text) }> INST }
Generate the ELF file using
riscv32-unknown-elf-gcc
which automatically callsriscv32-unknown-elf-ld
:riscv32-unknown-elf-gcc -nostdlib -T mycpu.ld -o simple.elf simple.o
Create a raw binary or hex file from the
.elf
file which I will use to populate the contents of the memory.riscv32-unknown-elf-objcopy -O binary simple.elf simple.hex
Final simple.hex
contains the following (using hexyl
):
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 1b 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │•0000000┊00000000│
│00000010│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │00000000┊00000000│
│00000020│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │00000000┊00000000│
│00000030│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │00000000┊00000000│
│00000040│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │00000000┊00000000│
│00000050│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │00000000┊00000000│
│00000060│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │00000000┊00000000│
│00000070│ 00 00 00 00 00 00 00 00 ┊ 00 00 00 00 00 00 00 00 │00000000┊00000000│
│00000080│ b3 02 73 00 ┊ │וs0 ┊ │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘
where b3027300
is the hex value for add x5,x6,x7
.
And that's it! Big thanks to @PeterCordes for his help! :)
CodePudding user response:
how does the processor know which address to start fetching instructions from?
The actual CPU itself will have some hard-wired address that it fetches from on reset / power-on. Usually a system will be designed with ROM or flash at that phys address. (And might have code for an ELF program loader which will respect the ELF entry-point metadata, or you could just link a flat binary with the right code at the start of the binary.)
What is going on here? I thought I'd have a simple single line of hex, but there's a lot more going on.
Your objdump -D
disassembles all ELF sections, not just .text. As you can see, there is only one instruction in the .text section, and if you used objdump -d that's what you'd see. (I normally use objdump -drwC
, although -w
no line-wrapping is probably irrelevant for RISC-V, unlike x86 where a single insn can be long.)
Would it be possible to pass the file I compiled above as is to my processor?
Not in the way you're probably thinking. Also note that you chose the wrong file name for the output. as produces an object file (normally .o), not an executable. You could link with ld
into a flat binary, or link and objcopy
the .text
section out of it.
(You could in theory put a whole ELF executable or even object file into ROM such that the .text
section happens to start where the CPU will fetch from, but nothing will look at the metadata bytes. So the ELF entry-point address metadata in an ELF executable would be irrelevant.)
Difference between a .o
and an executable: a .o
just has relocation metadata for the linker to fill in actual addresses, absolute for la
pseudo-instructions, or relative for auipc
in cases like multiple .o
files where one references a symbol from the other. (Otherwise the relative displacement could be calculated at assemble time, not left for link time.)
So if you had code that used any labels for memory addresses, you'd need the linker to fill in those relocation entries in your code. Then you could objcopy
some sections out of a linked ELF executable. Or use a linker script to set the layout for your flat binary.
For your simple case with only an add
, no la
or anything, there are no relocation entries so the text section in the .o
is the same as in a linked executable.
Also tricky to get right with objcopy
is static data, e.g. .data
and .bss
sections. If you copy just the .text
section to a flat binary, you won't have data anywhere. (But in a ROM, you'd need a startup function that copies static initializers from ROM to RAM for .data
, and zeros the .bss
space. If you want to write the asm source to have a normal-looking .data
section with non-zero values, you'd want your build scripts to figure out the size to copy so your startup function can use it, instead of having to manually do all that.)