Track a client through HTTP request-CodePudding

In case of HTTP requests like HEAD / GET / POST etc, which information of client is received by the server?

I know some of the info includes client IP, which can be used to block a user in case of, lets say, too many requests.

Another information of use would be user-agent, which is different for browsers, scripts, curl, postman etc. (Of course client can change default by setting request headers, but thats alright)

I want to know which other parameters can be used to identify a client (or define some properties)? Does the server get the mac address somehow?

So, is there a possibility that just by the request, it is identifiable that this request is being done by a "bot" (python or java code, eg.) vs a genuine user?

Assume there is no token or any such secret shared between client-server so there is no session...each subsequent request is independent.

CodePudding user response：

The technique you are describing is generally called fingerprinting - the article covers properties and techniques. Depending on the use there are many criticisms of it, as it bypasses a users intention of being anonymous. In all cases it is a statistical technique - like most analytics.

CodePudding user response：

I don't think you can rely on any HTTP request header, because a client might not send it to the server, and/or there might be proxies between the client and the server that strip or alter the request headers.

If you just want to associate a unique ID to an HTTP request, you could generate an ID on your backend. For example, the JavaScript framework Hapi.js computes a request ID using this code:

new Date()   '-'   process.pid   '-'   Math.floor(Math.random() * 0x10000)

You might not even need to generate an ID manually. For example, if your app is on AWS and there is an Application Load Balancer in front of your backend, the incoming request will have the custom header X-Amzn-Trace-Id.

As for distinguishing between requests made by human clients and bots, I think you could adopt a "time trap" approach like the one described in this answer about honeypots for spambots.