API responses: references inside the same response is a bad practice?-CodePudding

For years on my job I worked with this request, that give us a response like this (is a short example with many fields cut off):

{
    "catalog": {
        "categories": [
            {
                "id": "firstCategory",
                "name": "The first category!",
                "order": 0,
                "offers": [
                    {
                        "id": "offer1" // a LOT of fields
                    },
                    {
                        "id": "offer2" // a LOT of fields
                    },
                    {
                        "id": "offer3" // a LOT of fields
                    }
                ]
            },
            {
                "id": "secondCategory",
                "name": "The second category!",
                "order": 0,
                "offers": [
                    {
                        "id": "offer2" // a LOT of fields ... same again
                    },
                    {
                        "id": "offer3" // a LOT of fields ... same again
                    },
                    {
                        "id": "offer4" // a LOT of fields
                    }
                ]
            },
            {
                "id": "thirdCategory",
                "name": "The third category!",
                "order": 0,
                "offers": [
                    {
                        "id": "offer1" // a LOT of fields ... same again
                    },
                    {
                        "id": "offer4" // a LOT of fields ... same again
                    },
                    {
                        "id": "offer5" // a LOT of fields
                    }
                ]
            }
        ]
    }
}

How you can see, we have a certain number of categories (in my day-by-day are 5-15) that contains some offers (10-30 for category). Many times these Offers are repeated themselves in different categories: these offers are exactly the same (the offer with id="offer1" inside firstCategory is exactly the same of the offer with id="offer1" inside thirdCategory).

It happens very often and, as if that weren't enough, an Offer element contains many fields (something like 50... in my example I cut them for obvious reasons): this leads to a really large response.

I was wondering if it was a correct practice to try to remedy the problem through a reference system set up like this:

{
    "catalog": {
        "categories": [
            {
                "id": "firstCategory",
                "name": "The first category!",
                "order": 0,
                "idOffers": [ "offer1", "offer2", "offer3" ]
            },
            {
                "id": "secondCategory",
                "name": "The second category!",
                "order": 1,
                "idOffers": [ "offer2", "offer3", "offer4" ]
            },
            {
                "id": "thirdCategory",
                "name": "The third category!",
                "order": 2,
                "idOffers": [ "offer1", "offer4", "offer5" ]
            }
        ],
        "offers": [
            {
                "id": "offer1" // a LOT of fields... but written only once
            },
            {
                "id": "offer2" // a LOT of fields... but written only once
            },
            {
                "id": "offer3" // a LOT of fields... but written only once
            },
            {
                "id": "offer4" // a LOT of fields... but written only once
            },
            {
                "id": "offer5" // a LOT of fields... but written only once
            }
        ]
    }
}

While it seems more practical and cleaner to me, what I want to know is: is it a bad practice?

PS: if someone is wondering if it was more correct to make two requests like GET catalog/categories and GET catalog/{idCategory}/offers or something like that, he must know that for various business and architectural reasons the request to be made must remain only one.

CodePudding user response：

There's nothing intrinsically wrong with your suggestion; it may well be a much better approach. Apart from the obvious benefit of reducing the payload size, your suggestion will also eliminate the possibility of data anomalies. Anomalies could occur in the existing structure because it allows for two different 'copies' of the same offer to have property values that don't tally. If you don't completely and utterly trust the data source, then you might feel the need to validate incoming offers. But you can't even do this, because which of the many copies of (say) offer1 could you regard as the 'master' copy?

It is for this reason that Ted Codd invented First Normal Form, which is popularly summarised as "eliminate repeating groups".

I said that this may well be a better approach, but whether it actually is a better approach depends a little on your context. For example, if you want to immediately store this data in a relational database, then it's definitely a better approach because the normalisation has been pushed upstream, possibly back to the source, but certainly outside the scope of your application, meaning you don't have to worry about the aforementioned anomalies, and you have way less data to process.

On the other hand, if you want to produce a hierarchical report - or store the data in a hierarchical structure akin to the one you have received, then it's not quite so clear cut. I would still argue eliminating the risk of anomalies is a big gain, as is the potentially large reduction in payload size, but you may pay a price by having to cache the offers somewhere, and then having to 'look up' the offer data as you process each idOffers array. If performance suffers badly from this you might need to weigh these things up. If there are a relatively small number of offers, I doubt this would be an issue.

So I think your instincts are sound.