How to filter bots on Express.js server-CodePudding

I have created an express node.js API, and deployed it to AWS (Elasticbeanstalk with 2 EC2 instances). I am using the morgan-body package to log the requests and responses on my endpoints, but it seems that tons of bots are "attacking" my API, and this results in millions of logs every months, which cost me a fortune with datadog. I have used morgan-boday's built-in "skip" feature to filter requests based on the user agents, but new ones seem to appear every day. Is there a way to skip logging for all kinds of bots, without checking them one by one ? Here is my code, many thanks for your help ! :)

morganBody(app, {
skip: (req, res) => {
    if(req.get('user-agent')){
        if (req.get('user-agent').startsWith('ELB-HealthChecker') ||
        req.get('user-agent').startsWith('Mozilla') ||
        req.get('user-agent').startsWith('Mozlila')||
        req.get('user-agent').startsWith('Python')||
        req.get('user-agent').startsWith('python')||
        req.get('user-agent').startsWith('l9explore')||
        req.get('user-agent').startsWith('Go-http-client')
        
        ) {
            return true
        }
    }
    return false},
    logRequestBody:false,
    logResponseBody: false
});```

CodePudding user response：

I figured out part of the answer, by simply skipping all GET requests:

if (req.method === "GET") {
    return true
}

But I am still getting some POST requests by bots which increase my logs volumes and I still do not know how to filter them... Thanks if you have an answer !

CodePudding user response：

Welcome to internet. Bot/Spam detection is one of most trivial problem to solve. Every logic you add can be negated by reverse logic at the client side.

AWS itself has a tool for it. https://aws.amazon.com/waf/features/bot-control/

A good strategy to filter traffic will be based on use case.

Some suggestions.

introduce login/session allow only authenticated session
request headers filtering
Ip ranges filter
Amount of traffic from single i.p.
Request rate from different IP etc.
Take service offline when not required.

There should be more material available on internet.