Home > Enterprise >  Why does 'require' keep on loading files as strings in the heap even though it's remo
Why does 'require' keep on loading files as strings in the heap even though it's remo

Time:04-15

I have a NodeJS application that is like a task manager. Every second (using a rxjs timer), it fetches the tasks from the database that are scheduled. Every task has a filename and that file is loaded with required. Every file must have a run() method that executes the task. After the task is completed the required file is removed from require.cache. The application fails after a couple of days, because the heap is full. I've checked with node --inspect and saw that the required files are kept in the strings. Can somebody explain me why the required files are kept as strings on the heap, while the file is removed from the require.cache? And even better, if someone can explain to me how it can be solved it would be even better!

Below is a shortened version of my code

    async test() {
        timer(0, 1000).pipe(
            exhaustMap(async () => {
                return await this.executeTasks();
            })
        ).subscribe();
    }

    async executeTasks(): Promise<void> {
        // get all queue items
        const tasks = await this.getTasks();

        // for each item
        for (const task of tasks) {
            // execute item
            await this.executeTask(task);
        }

    }

    async executeTask(item: QueueItem) {

        try {
            const Callback: typeof AutomationCallback = require(item.callbackFile);
            const callback = new Callback();

            await callback.run();
        } catch (error) {
            if (error instanceof Error && (error as any).code === 'MODULE_NOT_FOUND') {
                log_message('Failed executing', item.callback, item.id, `Cannot find file ${item.callbackFile}`);
            } else if (error instanceof RescheduleError) {
                log_message('Failed executing', item.callback, item.id, error.message);
            } else {
                log_message('Failed executing', item.callback, item.id, (error as any).toString());
            }

        } finally {
            // remove require from cache;
            try {
                delete require.cache[require.resolve(item.callbackFile)];
            } catch (error) {
                
            }
        }
    }

Below is a screenshot of a snapshot, the highlighted file is added to the heap everytime the file is loaded with require. As you can see on the left side the heap will slowly fill, time between snapshot 1 and 2 is probably 5 minutes. Screenshot of the heap

CodePudding user response:

If everything is previously defined, why not have them required at top & call them conditionally?

Example:

class TaskRunnerBase {
  async run() { this.doRun(); }
  async doRun() {} // implemented by specific runners
}
class TaskRunnerA extends TaskRunnerBase {
  async doRun() {...} // custom implementation for TaskRunnerA
}
class TaskRunnerB extends TaskRunnerBase {
  async doRun() {...} // custom implementation for TaskRunnerB
}
class DefaultRunner extends TaskRunnerBase {
  async doRun() { console.log('Default Runner'); } 
}

TaskManager

const TaskRunnerA = require('./task-runners/task-runner-a');
const TaskRunnerB = require('./task-runners/task-runner-b');
// more runners
const DefaultRunner = require('./task-runners/default-runner');

...
...
async executeTask(item: QueueItem) {

        try {
            const Callback: typeof AutomationCallback = this.getRunnerByType(item.callbackFile);
            const callback = new Callback();

            await callback.run();
        } catch (error) {
            ...
        } finally {
            ...
        }
    }
    

getRunnerByType(type) {
  if(type === 'RunnerA') { // "RunnerA" for example
    return TaskRunnerA;
  }
  else if(type === 'RunnerB') { // "RunnerB" for example
    return TaskRunnerB;
  }
  ....
  else {
    return DefaultRunner; // If unknown type is passed, let's keep going, with DefaultRunner.
  }
}

This is kind of approach is simple, manageable & no brainer in a long run.

CodePudding user response:

require.cache is not the only place where loaded modules are cached. There are a few more ones depending on the type of module and how it is loaded, and some of these objects are not even exposed by Node.js, meaning that you could not delete them if you wanted.

A well-documented example is module.children, an array of loaded modules. If you look at the length of this array, you'll see that it gets larger and larger after each require, because a you are effectively loading the same modules over and over again, instead of reusing the cached objects:

const Callback: typeof AutomationCallback = require(item.callbackFile);
console.log('Loaded modules:', module.children.length); // <-
const callback = new Callback();

await callback.run();

You could of course clear the children list (which is not used as a cache anyway) together with the require cache:

// No guarantee that this will help.
delete require.cache[require.resolve(item.callbackFile)];
module.children.pop();

I even wrote my own tool to do something like that in a way that doesn't affect the rest of the program.

Doing so may help or not, and in fact, it could be even counterproductive if the sole goal is to prevent memory leaks - by allocating resources to load a module every time the module is required.

Your best option is probably to get rid of the finally block and keep the require.cache unchanged. In this way, each module will be loaded at most once.

  • Related