Home > Software design >  Django lookup by JSONField array value
Django lookup by JSONField array value

Time:12-01

Let's say I have MySQL database records with this structure

{
            "id": 44207,
            "actors": [
                {
                    "id": "9c88bd9c-f41b-59fa-bfb6-427b1755ea64",
                    "name": "APT41",
                    "scope": "confirmed"
                },
                {
                    "id": "6f82bd9c-f31b-59fa-bf26-427b1355ea64",
                    "name": "APT67",
                    "scope": "confirmed"
                }
            ],
},
{
            "id": 44208,
            "actors": [
                {
                    "id": "427b1355ea64-bfb6-59fa-bfb6-427b1755ea64",
                    "name": "APT21",
                    "scope": "confirmed"
                },
                {
                    "id": "9c88bd9c-f31b-59fa-bf26-427b1355ea64",
                    "name": "APT22",
                    "scope": "confirmed"
                }
            ],
},
...

"actors" is a JSONField

Any way I can filter all of the objects whose actors name contains '67', for example?

Closest variant I have is that I got it working like that:

queryset.filter(actors__contains=[{"name":"APT67"}])

But this query matches by exact actor.name value, while I want to to accept 'contains' operator.

I also have it working by quering with strict array index, like this:

queryset.filter(actors__0__name__icontains='67')

But it only matches if first element in array matches my request. And I need that object shall be returned in any of his actors matches my query, so I was expecting something like queryset.filter(actors__name__icontains='67') to work, but it's not working :(

So far I have to use models.Q and multiple ORs to support my needs, like this -

search_query = models.Q(actors__0__name__icontains='67') | models.Q(actors__1__name__icontains='67') | models.Q(actors__2__name__icontains='67') | models.Q(actors__3__name__icontains='67')
queryset.filter(search_query)

but this looks horrible and supports only 4 elements lookup(or I have to include more OR's)

Any clues if thats possible to be solved normal way overall?

CodePudding user response:

Following this answer and the linked answer in the same post.

'contains' or 'icontains' looks for the patterns '%string%', which in your case assumes '67' is between characters. But, the number pattern is at the end of your actor name.

So, based on the answers I linked, you should probably try endswith or iendswith, in order to look for the pattern 'g'

CodePudding user response:

My data model:

class MyCustomModel(models.Model):
   id = models.BigAutoField(primary_key=True)
   actors = models.JSONField(blank=True, null=True)

I ended up with quite hacky lookup operator which replaces '$."' into '$[*]."' in my JSON field queries, which in my case was making the correct query, filtering all the objects whos JSON field with array of objects, contains one of the needed property.

Lookup operator:

from django.db.models.lookups import IContains
from django.db.models import Field

# Custom lookup which acts like the default IContains lookup but also replaces field name to match all JSON array objects ['*'].field_name in the query instead of $.field_name.
# Maybe this could be done in a better way with better Q field path, but this works for now.
class JSONArrayContains(IContains):
    lookup_name = 'jsonarraycontains'
    
    def __init__(self, lhs, rhs):
        self.lookup_name = 'icontains' # we fake the lookup name to get the right operators further
        super().__init__(lhs, rhs)
        

    def as_sql(self, compiler, connection):
        lhs_sql, params = self.process_lhs(compiler, connection)

        # !! HERE IS THE MAGIC
        # we need to replace params parts which are like '$."name"' into parts like '$[*]."name"' if param is string and matches $." pattern
        params = [param.replace('$."', '$[*]."') if isinstance(param, str) and param.startswith('$."') else param for param in params] 

        rhs_sql, rhs_params = self.process_rhs(compiler, connection)
        params.extend(rhs_params)
        rhs_sql = self.get_rhs_op(connection, rhs_sql)
        return f'{lhs_sql} {rhs_sql}', params

Usage:

queryset.filter(actors__name__jsonarraycontains='67')

Which is filtering all the records, which

  • Related