Really strange thing happening here.. I can't see the full stack trace with 'bt' command in gdb. So I tried with fresh linux-5.10.122 source and qemu-6.2.0 source and it's happening too! (But it's not happening with linux-5.4.21 with defconfig, with qemu 5.1.0 or 6.2.0)
I would be grateful if somebody could check if this happens to other people or just me.
- download linux-5.1.122 tarball from https://www.kernel.org/
- uncompress it and set env variable ARCH=arm64, CROSS_COMPILE=aarch64-none-elf- , do "make defconfig" and "make -j
nproc
Image" - download qemu-6.2.0 from https://www.qemu.org/
- uncompress it and do "mkdir build" "cd build" "../configure --target-list=aarch64-softmmu --enable-debug"
- run qemu and wait for debugger to attach.
qemu-6.2.0/build/aarch64-softmmu/qemu-system-aarch64 -machine virt,gic-version=max,secure=off,virtualization=true -cpu max -kernel linux-5.10.112/arch/arm64/boot/Image -m 2G -nographic -netdev user,id=vnet,hostfwd=:127.0.0.1:0-:22,tftp=/srv/tftp -device virtio-net-pci,netdev=vnet -machine iommu=smmuv3 --append "root=/dev/ram init=/init nokaslr earlycon ip=dhcp hugepages=16" -s -S - run debugger, do "aarch64-none-elf-gdb linux-6.10.112/vmlinux -x gdb_script" (gdb_script content : target remote :1234 layout src b start_kernel b __driver_attach )
Now, in gdb, when you press 'c' twice, it'll stop at the first __driver_attach. (first one stops at start_kernel).
When you are at __attach_driver, type 'bt'. See if you see the full function stack trace.
This is what I see.
(gdb) bt
#0 __driver_attach (dev=0xffff000002582810, data=0xffff800011dc2358 <dummy_regulator_driver 40>)
at drivers/base/dd.c:1060
#1 0xffff8000107a3ed0 in bus_for_each_dev (bus=<optimized out>, start=<optimized out>,
data=0xffff800011dc2358 <dummy_regulator_driver 40>, fn=0xffff8000107a6f60 <__driver_attach>)
at drivers/base/bus.c:305
#2 0xd6d78000107a5c58 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I used to see more than 20 stack frames but strangely I see only two.
I can still see many stacks for linux-5.4.21 that I was working with in the past.
Could anyone check if this happens to anyone else too?
Even though I can't see the whole stack frames, I think if I add BLK_DEV_RAM and set initramfs.cpio.gz in the linux build, the kernel will boot ok to the shell prompt. So linux is running ok but only the gdb can't show the stack levels.
My OS : ubuntu-20.04 5.13.0-35-generic
$ aarch64-none-elf-gdb --version
GNU gdb (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.1.90.20201028-git Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3 : GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
It looks like as kern version increase, at some point there is problem in gdb 'bt' command?
ADD
I found CONFIG_DEBUG_FRAME_POINTER, CONFIG_DEBUG_INFO are already set by default. And I tried adding CONFIG_DEBUG_KERNEL, CONFIG_KGDB, CONFIG_GDB_SCRIPTS, CONFIG_STACKTRACE all to no avail. and I need to do it for arm64 qemu virt machine.
ADD2 01:10 4/26/2022 UTC
I found in another breakpoint case at __driver_attach,
(gdb) bt
#0 __driver_attach (dev=dev@entry=0xffff0000401d1810, data=data@entry=0xffff800011bbbbb8 <mxc_gpio_driver 40>) at drivers/base/dd.c:1046
#1 0xffff8000107684f8 in bus_for_each_dev (bus=0xffff800011cba910 <platform_bus_type>, start=0x0, data=0xffff800011bbbbb8 <mxc_gpio_driver 40>, fn=0xffff80001076b860 <__driver_attach>) at drivers/base/bus.c:307
#2 0xb8cd80001076a594 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) x/5g $sp
0xffff800011dcbcc0: 0xffff800011dcbd20 0xb8cd80001076a594
0xffff800011dcbcd0: 0xffff80001076b860 0xffff800011bbbbb8
0xffff800011dcbce0: 0x0000000000000000
Because it's right after the pc reached the function __driver_attach, the sp is still not updated from previous function (bus_for_each_dev). And the first two values at the $sp are supposed to be the fp and lr of the previous function (see understanding aarch64 assembly function call, how is stack operated arm64 stores previous function's fp and lr at the bottom of new stack frame as it enters a function). The lr (link register, the address to return after this bus_for_each_dev function) is 0xb8cd80001076a594 which is weird (not a kernel address). The following 3 values are function arguments for bus_for_each_dev and they look correct.
ADD (08:20 27/04/2022 UTC)
I tried to break at driver_attach. It calls bus_for_each_dev and bus_for_each_dev calls __driver_attach. When I entered bus_for_each_dev, I checked the assembly code. It placed x29 and x30 at [sp, #-80]! (stp x29, x30, [sp, #-80]!
) so I checked the value of x29(fp) and x30(lr). They were 0xffff800011efbd20 and 0xffff8000107a52f8 each. Those values were placed at the bottome of stack frame of bus_for_each_dev. Now inside the bus_for_each_dev function, I enter __driver_attach. At this point I checked the two values in $sp (The sp value is still that of bus_for_each_dev). They were 0xffff800011efbd20 (correct) and 0xc9a48000107a52f8 (wrong!). Why did the upper 16 bits changed??
And I soon found when the x29, x30 are wrtten at the new stack bottom, the upper 16bits of x30 are written with wrong values at the first place. So if I fix these 16 bit to correct value (0xffff usually, because top kernel address bits are 0xffff), the bt output shows more. The more x30 fix, the more stack frames I can see.. I have filed a bug to bugs.linaro.org so that an expert can check this.
CodePudding user response:
I just found out by turning CONFIG_ARM64_PTR_AUTH off in armv8.3 when building linux, I can avoid this problem. (I noticed the instruction ‘pacia’ at the start of function assembly code) (I asked kernelnewbies and qemu-discuss email list but experts don't respond often..) Hope this is helpful to someone later.