Home > OS >  Appropriate REST design for data export
Appropriate REST design for data export

Time:10-01

What's the most appropriate way in REST to export something as PDF or other document type?

The next example explains my problem:

I have a resource called Banana. I created all the canonical CRUD rest endpoint for that resource (i.e. GET /bananas; GET /bananas/{id}; POST /bananas/{id}; ...) Now I need to create an endpoint which downloads a file (PDF, CSV, ..) which contains the representation of all the bananas.

First thing that came to my mind is GET /bananas/export, but in pure rest using verbs in url should not be allowed. Using a more appropriate httpMethod might be cool, something like EXPORT /bananas, but unfortunately this is not (yet?) possible.

Finally I thought about using the Accept header on the same GET /bananas endpoint, which based on the different media type (application/json, application/pdf, ..) returns the corresponding representation of the data (json, pdf, ..), but I'm not sure if I am misusing the Accept header in this way.

Any ideas?

CodePudding user response:

Media types are the best way to represent this, but there is a practical aspect of this in that people will browse a rest API using root nouns... I'd put some record-count limits on it, maybe GET /bananas/export/100 to get the first 100, and GET /bananas/export/all if they really want all of them.

CodePudding user response:

in pure rest using verbs in url should not be allowed.

REST doesn't care what spelling conventions you use in your resource identifiers.

Example: https://www.merriam-webster.com/dictionary/post

Even though "post" is a verb (and worse, an HTTP method token!) that URI works just like every other resource identifier on the web.


The more interesting question, from a REST perspective, is whether the identifier should be the same that is used in some other context, or different.

REST cares a lot about caching (that's important to making the web "web scale"). In HTTP, caching is primarily about re-using prior responses. The basic (but incomplete) idea being that we may be able to re-use a response that shares the same target URI.

HTTP also has built into it a general purpose mechanism for invalidating stored responses that is also focused on the target URI.

So here's one part of the riddle you need to think about: when someone sends a POST request to /bananas, should caches throw away the prior responses with the PDF representations?

If the answer is "no", then you need a different target URI. That can be anything that makes sense to you. /pdfs/bananas for example. (How many common path segments are used in the identifiers depends on how much convenience you will realize from relative references and dot segments.)

If the answer is "yes", then you may want to lean into using content negotiation.

In some cases, the answer might be "both" -- which is to say, to have multiple resources (each with its own identifier) that return the same representations.
That's a normal thing to do; we even have a mechanism for describing which resource is "preferred" (see RFC 6596).

  • Related