Home > Back-end >  Django, Create GIN index for child element in JSON Array field
Django, Create GIN index for child element in JSON Array field

Time:04-20

I have a model that uses PostgreSQL and has field like this:

class MyModel(models.Model):
    json_field = models.JSONField(default=list)

This field contains data like this:

[
  {"name": "AAAAA", "product": "11111"},
  {"name": "BBBBB", "product": "22222"},
]

Now I want to index by json_field -> product field, because it is being used as identification. Then i want to create GinIndex like this:

class Meta:
    indexes = [
        GinIndex(name='product_json_idx', fields=['json_field->product'], opclasses=['jsonb_path_ops'])
    ]

When I try to create migration, I get error like this:

'indexes' refers to the nonexistent field 'json_field->product'.

How to create GinIndex that will be used for child attribute in Json Array?

CodePudding user response:

Please don't use a JSONField [Django-doc] for well-structured data: if the structure is clear, like here where we have a list of objects where each object has a name and a product, it makes more sense to work with extra models, like:

class MyModel(models.Model):
    # …
    pass

class Product(models.Model):
    # …
    pass

class Entry(models.Model):
    my_model = models.ForeignKey(MyModel, on_delete=models.CASCADE)
    name = models.CharField(max_length=255)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)

This will automatically add indexes on the ForeignKeys, but will also make querying simpeler and usually more efficient.

While databases like PostgreSQL indeed have put effort into making JSON columns easier to query, aggregate, etc. usually it is still beter to perform database normalization [wiki], especially since it has more means for referential integrity, and a lot of aggregates are simpeler on linear data.

If for example later a product is removed, it will require a lot of work to inspect the JSON blobs to remove that product. This is however a scenario that both Django and PostgreSQL databases cover with ON DELETE triggers and which will likely be more effective and safe when using the Django toolchain for this.

  • Related