Home > Net >  section .data will not fit in region 'dff' error
section .data will not fit in region 'dff' error

Time:11-10

I was trying to run some c program on a RISCV processor and I got this:

/foss/tools/riscv-gnu-toolchain-rv32i/217e7f3debe424d61374d31e33a091a630535937/lib/gcc/riscv32-unknown-linux-gnu/11.1.0/../../../../riscv32-unknown-linux-gnu/bin/ld: test_la.elf section `.data' will not fit in region `dff'
/foss/tools/riscv-gnu-toolchain-rv32i/217e7f3debe424d61374d31e33a091a630535937/lib/gcc/riscv32-unknown-linux-gnu/11.1.0/../../../../riscv32-unknown-linux-gnu/bin/ld: region `dff' overflowed by 1624 bytes
collect2: error: ld returned 1 exit status

According to a comment in this thread, it might be caused by declaring some large global arrays. And it is true for me, I have these globally (outside the main function):

int sig_A [Bits] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};

int sig_B [Bits] = { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

int sig_C [Bits] = { 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0};

int data_i [Bits] = { 0, 0, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2, 3, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 2, 3, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 6, 0, 0};

I need these data to be send from the RISCV to some digital circuit. How can I make this happen (if there is a way). Thank you.

linker file

/* Copyright lowRISC contributors.
   Licensed under the Apache License, Version 2.0, see LICENSE for details.
   SPDX-License-Identifier: Apache-2.0 */

INCLUDE ../generated/output_format.ld

OUTPUT_ARCH(riscv)

/*******
MEMORY
{

   Change this if you'd like different sizes. Arty A7-100(35) has a maximum of 607.5KB(225KB)
   BRAM space. Configuration below is for maximum BRAM capacity with Artya A7-35 while letting
   CoreMark run (.vmem of 152.8KB).
    ram         : ORIGIN = 0x00100000, LENGTH = 0x30000 * 192 kB *
    stack       : ORIGIN = 0x00130000, LENGTH = 0x8000  * 32 kB *
}
**********/

_entry_point = _vectors_start   0x80;
ENTRY(_entry_point)

/* The tohost address is used by Spike for a magic "stop me now" message. This
   is set to equal SIM_CTRL_CTRL (see simple_system_regs.h), which has that
   effect in simple_system simulations. Note that it must be 8-byte aligned.

   We don't read data back from Spike, so fromhost is set to some dummy value:
   we place it just above the top of the stack.
 */
tohost   = 0x20008;
fromhost = _stack_start   0x10;

SECTIONS
{
    .vectors :
    {
        . = ALIGN(4);
        _vectors_start = .;
        KEEP(*(.vectors))
        _vectors_end = .;
    } > flash

    .text : {
        . = ALIGN(4);
        *(.text)
        *(.text.*)
    }  > flash

    .rodata : {
        . = ALIGN(4);
        /* Small RO data before large RO data */
        *(.srodata)
        *(.srodata.*)
        *(.rodata);
        *(.rodata.*)
    } > flash

    .data : {
        . = ALIGN(4);
        /* Small data before large data */
        *(.sdata)
        *(.sdata.*)
        *(.data);
        *(.data.*)
    } > dff AT > flash

    .bss :
    {
        . = ALIGN(4);
        _bss_start = .;
        /* Small BSS before large BSS */
        *(.sbss)
        *(.sbss.*)
        *(.bss)
        *(.bss.*)
        *(COMMON)
        _bss_end = .;
    } > dff

}

PROVIDE(_stack_start = ORIGIN(sram)   LENGTH(sram));

CodePudding user response:

could you please point me to a link to understand the linker script myself? – temp1445

I did a web search on "linker script" documentation and got (amongst 1.2M): https://wiki.osdev.org/Linker_Scripts

I downloaded the repo you linked to [all 14 GB of it ;-)]. In the repo, we have a few files:

./verilog/dv/caravel/sections.lds
./verilog/dv/caravel/mgmt_soc/irq/sections.lds

They are similar except for the origin address of RAM. Here is a snippet of the first:

MEMORY {
    FLASH (rx)  : ORIGIN = 0x10000000, LENGTH = 0x400000    /* 4MB */
    RAM(xrw)    : ORIGIN = 0x00000000, LENGTH = 0x0400      /* 256 words (1 KB) */
}

This means that you only have 1 KB of ram!

But, your example refers to dff. There is a python script that refers to a linker script by that name, but it must be [auto-]generated as there is no dff.lds in the tree.


AFAICT, caravel is a gate level verilog/VHDL design for a riscv FPGA or ASIC implementation.

I have done [linux] S/W kernel porting to a new ASIC CPU design [20 years ago], using a gate level simulator.

But, this can be very slow.

For example, during boot, linux would go into a spin loop to calculate CPU speed (i.e. BogoMIPS). This test, on real H/W takes a small fraction of a second (e.g. 1ms). On the simulator, it took one full day.

So, I replaced that loop with a NOP, and hardwired the frequency (using (e.g.)):

#ifdef _USE_GATE_SIMULATOR_
bogoMIPS = 200;
#else
while (...) {
}
bogoMIPS = ...;
#endif

Then, we had to develop S/W device drivers for the [proposed] H/W. The verilog code for some of the H/W wasn't complete.

But, we did have functional specs for the H/W.

Our solution was to do most of the S/W development on:

  1. A functional emulator (similar to qemu) to which we added emulation of the H/W devices.
  2. Userspace code, run on the development machine (e.g. x86 PC) that ran x86 code and had a library that allowed us to make calls into the H/W emulation functions.
  3. A real H/W SDK board that had similar H/W
  4. The gate level simulator

We configured the S/W to be able to work in any of these environments with conditional compilation.

We did most of the S/W development work using (2) and (1). When we were 99.44% sure that there were no software bugs (e.g. UB, etc.), only then would we do a full verification using (3).

That way, we didn't burn hours of simulator time to find a [trivial] S/W bug that was found in 5 seconds on the SDK board or emulator.

This also allowed the programmers to be able to debug the S/W independently of the H/W designers verilog code [which often had more bugs than the S/W did].


You didn't specify what your primary goal was, but I'd highly recommend a build environment like that.

If you're just wanting to experiment with the riscv architecture at the assembly level, then I'd use qemu or equivalent.

If you're trying to write S/W to drive current/proposed H/W devices, I'd still recommend a similar strategy to what we had.

CodePudding user response:

Your program defines some big arrays, which are bigger than the available RAM. The error message says (cut the prefix with the linker's path):

test_la.elf section `.data' will not fit in region `dff'
region `dff' overflowed by 1624 bytes

The section .data collects all writable static* variables, and this includes the arrays. Since this section is allocated in the region dff, it will overflow.

Now you have at least two options.

1. Reduce the size of the arrays

You say in a comment that all values in the arrays are as small as shown, in the range 0 to 6. The element data type is int, which seems to have a size of 4 bytes. This is a lot for storing small values.

You can invent some way to pack only needed bits into available space, but this needs some unpacking code at run time, depending on the purpose of the values.

A simpler way is to use the smallest possible and appropriate data type, in your case uint8_t. You did this as you confirm in a comment, and it solved the issue for the moment.

However, as soon as you add more variables to your program, the same issue will rise again. Therefore, I suggest to use the second option.

Depending on the usage of the values, you can additionally combine both options, minimizing the footprint of the arrays in ROM, too.

2. Move the arrays to ROM

As you confirm in a comment, your program only reads the values of these arrays, their contents are fixed.

Add the modifier const to them. This will put them into the section .rodata, which means "read-only data". As this section is allocated in the region flash, it will not fill up your scarce RAM.

Note: Be aware that some processors might need special instructions to read data from ROM. Depending on the necessity and the cleverness of the compiler, this might need changes to your code, or not. Just saying.

Anyway, you should always use modifiers to show such characteristics. Beginner's courses rarely get to this, and on PCs they get away without. But in the embedded world, always modify read-only variables with const.


Here is a hint to check the memory usage: Let the linker generate a map file. For GNU based tool chains, look up the option -Map. For example, you can add -Wl,-Map=output.map to the command line, if you run the linker via the frontend "gcc".


*) Global variables are also static.

  • Related