Home > database >  REST API: Does validation on identifiers break encapsulation?
REST API: Does validation on identifiers break encapsulation?

Time:11-01

I figured I'd post here to get some ideas/feedback on something I've come up against recently. The API I've developed has validation on an identifier that's passed through as a path parameter: e.g. /resource/resource_identifier

There are some specific business rules as to what makes an idenfier valid and my API has validation which enforces these rules and returns a 400 when that's violated.

Now the reason I'm writing this is that I've been doing this sort of thing in every REST (ish) API I've ever written. It's kind of ingrained in me now but ecently I've been told that this is 'bad' and that it breaks encapsulation. Furthermore, it does this by forcing a consumer to have knowledge about the format of an identifier. I'm told that I should be returning a 404 instead and simply accept anything as an idenfier.

We've had some pretty heated debates about this and what encapsulation actually means in the context of REST. I've found numerous definitions but they aren't specific. As with any REST contention it's hard to substantiate an argument for either.

If StackOverflow would allow me, I'd like to try and gain a concensus on this and why APIs like Spotify for example, use 400 in this scenario.

CodePudding user response:

While it may sound natural to expose the resource internal ID as ID used in the URI, remember that the whole URI itself is the identifier of a resource and not only the last bit of the URI. Clients are usually also not interested in the characters that form the URI (or at least they shouldn't care about it) but only in the state they receive upon requesting that from the API/server.

Further, if you think long-term, which should be the reason why you want to build your design on top of a REST architecture, is there a chance that the internal identifier of a resource could ever change? If so, introducing an indirection could make more sense then i.e. by using UUIDs instead of product IDs in the URI and then have a further table/collection to perform a mapping from UUID to domain object ID. Think of a resource that exposes some data of a product. It may sound like a good idea to use the product ID at the end of the URI as they identify the product in your domain model clearly. But what happens if you company undergoes a merge with an other company that happens to have an overlap on product but then uses different identifiers than you? I've seen such cases in reality, unfortunately, and almost all of them wanted to avoid change for their clients and so had to support multiple URIs for the same products in the end.

This is exactly why Mike Amundsen said

... your data model is not your object model is not your resource model ... (Source)

REST is full of such indirection mechanisms to allow such systems to avoid coupling. I.e. besides above mentioned mechanism, you also have link-relations to allow servers to switch URIs when needed while clients can still lookup the URI via the exposed relation name, or its focus on negotiated media types and its representation formats rather than forcing clients to speak their API-specific RPC-like, plain-JSON slang.

Jim Webber further coined the term domain application protocol to describe that HTTP is an application protocol for exchanging documents and any business rules we infer are just side effects of the actual document management performed by HTTP. So all we do in "REST" is basically to send documents back and forth and infer some business logic to act upon receiving certain documents.

In regards to encapsulation, this isn't the scope of REST nor HTTP. What data you return depends on your business needs and/or on the capabilities of the representation formats exchanged. If a certain media-type isn't able to express a certain capability, providing such data to clients might not make much sense.

In general, I'd would recommend not to use domain internal IDs as part of URIs for the above mentioned reasons. Usually that information should be part of the exchanged payload to give users/customers the option to refer to that resources on other channels like e/mail, telephone, ... Of course, that depends on the resource and its purpose at hand. As a user I'd prefer to refer to myself with my full name rather than some internal user- or customer ID or the like.

edit: sorry, missed the validation aspect ...

If you expect user/client input on the server/API side, you should always validate the data before starting to process it. Usually though, URIs are provided by the server and might only trigger business activities if the URI requested matches one of your defined rules. In general, most frameworks will respond with 400 Bad Request responses when they couldn't map the URI to a concrete action, giving the client a chance to correct its mistake and reissue the updated request. As URIs shouldn't be generated or altered by clients anyways, validating such parameters might be unnecessary overhead unless they might introduce security risks. Here it might be a better approach then to toughen-up the mapping rules of URIs to actions then and let those frameworks respond with a 400 message when clients use stuff they aren't supposed to.

CodePudding user response:

I've been doing this sort of thing in every REST (ish) API I've ever written. It's kind of ingrained in me now but recently I've been told that this is 'bad'

In the context of HTTP, it is an "anti-pattern", yes.

I'm told that I should be returning a 404 instead

And that is the right pattern when you want the advantages of responding like a general purpose web server.

Here's the point: if you want general purpose components in the HTTP application to be able to do sensible things with your response messages, then you need to provide them with the appropriate meta data.

In the case of a target resource identifier that satisfies the request-target production rules defined in RFC 9112 but is otherwise unsatisfactory; you can choose any response semantics you want (400? 403? 404? 499? 200?).

But if you choose 404, then general purpose components will know that the response is an error that can be re-used for other requests (under appropriate conditions - see RFC 9111).


why APIs like Spotify for example, use 400 in this scenario.

Remember: engineering is about trade offs.

The benefits of caching may not outweigh more cost effective request processing, or more efficient incident analysis, or ....

It's also possible that it's just habit - it's done that way because that's the way that they have always done it; or because they were taught it as a "best practice", or whatever. One of the engineering trade offs we need to consider is whether or not to invest in analyzing a trade off!

An imperfect system that ships earns more market share than a perfect solution that doesn't.

  •  Tags:  
  • rest
  • Related