Home > Enterprise >  Why is printk with const char * accessed through a struct preventing driver from being removed?
Why is printk with const char * accessed through a struct preventing driver from being removed?

Time:08-08

I've been tinkering with Linux drivers, specifically the pcspkr driver. To begin with I just copied and pasted the original driver from my machine and replaced the event handler with some heavy logging as part of an investigative effort. Reduced down to a minimal sample, here's the handler code:

static int pcspkr_event(struct input_dev *dev, unsigned int type,
                unsigned int code, int value)
{
        static int number_of_calls = 0;
        printk(KERN_DEBUG "Starting to beep! This is beep number %d.\n",   number_of_calls);
        printk(KERN_DEBUG "Input device name: %s\n", dev->name);   // The problem line
        return 0;
}

It's the third line in that method that's the problem. Perhaps I should be checking dev for null (is it even possible for this to be called in that case?), but that's not the issue.

Building and loading the dirver works fine, and if I then call $ echo -e \\a and then take a peek at # dmesg -t I see this:

[  171.903667] Starting to beep! This is beep number 1.
[  171.903673] Input device name: PC Speaker
[  172.006138] Starting to beep! This is beep number 2.
[  172.006145] Input device name: PC Speaker
[  172.006154] Starting to beep! This is beep number 3.
[  172.006156] Input device name: PC Speaker

I'm not sure yet why it calls the method three times, but that's beside the point.

The problem arises when I try to remove the module with # rmmod pcspkr. Instead of just returning silently, it outputs "Killed" to the console. If I then go into # dmesg -t I see this:

Input device name: PC Speaker
Starting to beep! This is beep number 4.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 661 Comm: rmmod Tainted: G           OE     5.18.11-arch1-1 #1 50398f5e5a828e0d0e099049385fd5e709a30e3e
Hardware name: LENOVO 20YGCTO1WW/20YGCTO1WW, BIOS R1OET28W (1.07 ) 07/15/2021
RIP: 0010:pcspkr_event 0x28/0x3c [pcspkr]

(followed by the rest of the oops dump)

I'm not sure why the event handler is triggered when removing the module, but I'm nut sure that has anything to do with it. If I remove the logging line I marked as problematic it can be removed with no issue.

Once this error had occurred, attempting to load the module again would just freeze up the console indefinitely. I also tried calling # rmmod pcspkragain, which had no effect, and # rmmod -f pcspkr which had this output:

rmmod: ERROR: could not remove 'pcspkr': Device or resource busy
rmmod: ERROR: could not remove module pcspkr: Device or resource busy

In a way that's a helpful explanation of the issue, except I have absolutely no idea why that printk line results in the resource getting locked up.

I've both searched on Stack Overflow and elsewhere for anything that might be helpful, but haven't found anything. Please let me know if you have any vague hints that might be worth looking into. Thanks in advance.

CodePudding user response:

The described symptoms are explained by pcspkr_event being called with a null pointer for dev:

  • When dev is null, passing dev->name to printk attempts to dereference a null pointer, resulting in the message “BUG: kernel NULL pointer dereference, address: 0000000000000000”.
  • When this printk line is removed, the error does not occur, and removing the module succeeds.

Therefore, testing for dev being null and suppressing dereferencing of it in this case would solve the immediate problem.

  •  Tags:  
  • c
  • Related