Home > database >  How to discover the reason for an N 1 query in DRF serializer
How to discover the reason for an N 1 query in DRF serializer

Time:06-06

I have a Django REST Framework -based project where one of the endpoints is super slow. Turns out it's an N 1 queries problem, where each of the returned hundreds of rows causes an additional query. The endpoint is not supposed to do any joins or additional queries, just return the contents of a single table, filtered a bit. The model for that table contains a few foreign keys, but none of the referents are supposed to be accessed, only the ID's are supposed to be rendered as JSON and sent to the client.

A bit of trial and error reveals that it's DRF's Serializer that causes the additional queries:

class FooSerializer(serializers.ModelSerializer):
    class Meta:
        model = Foo
        fields = ["data_field_a", "data_field_b", "bar_code", "qux_id"]

When I comment bar_code out in the serializer, the query becomes as fast as expected. What's surprising is thatqux_id, even though the field is basically identical to bar_code, doesn't cause such a slowdown. I read https://www.django-rest-framework.org/api-guide/relations/ and among other stuff, tried setting depth = 0 in the Meta class.

Also, I understand that using select_related or prefetch_related would essentially serve as a quick fix for the problem, but strictly saying, those shouldn't be needed for this case, as no joins or additional queries are actually warranted, and I'm trying to understand the root cause of the problem to deepen my understanding.

So, I have a good grasp of actual cause of the slowdown (additional queries by the serializer), but less so about why the serializer decides to do those queries for one field but not for the other. My question is, how should I go on debugging this? I have no prior experience in DRF.

Edit: I was asked to share the code of the model, so here we go. I removed all the other fields, as they didn't seem to have anything to do with the bug; I could still reproduce it:

class Foo(models.Model):
    bar_code = models.ForeignKey(
        Bar, to_field="bar_code", db_column="bar_code", on_delete=models.CASCADE
    )
    qux_id = models.ForeignKey(
        Qux, to_field="qux_id", db_column="qux_id", on_delete=models.CASCADE
    )

(Edit: as later turns out, the difference in these two is whether the field is the primary key of the referred model or not.)

Edit2: Even more minimal:


class FooSerializer(serializers.ModelSerializer):
    class Meta:
        model = Foo
        fields = ["bar_code"]
class Foo(models.Model):
    bar_code = models.ForeignKey(
        Bar, to_field="bar_code", db_column="bar_code", on_delete=models.CASCADE
    )

CodePudding user response:

So, I discovered my answer, and here's how I debugged it. Mainly three things:

  1. Inspecting what the Serializer does. https://www.django-rest-framework.org/api-guide/relations/#inspecting-relationships
poetry run python manage.py shell
...
>>> from app.views import FooSerializer
>>> print(repr(FooSerializer()))
FooSerializer():
    bar_code = SlugRelatedField(queryset=Bar.objects.all(), slug_field='unique_code')

This clearly shows, that it is considering the field as a reference, not just a plain value. This might be obvious to some Django developers, but it certainly was not for me, as the value we are trying to serialize is the value of the field on Foo, not a field of Bar. It seems that it goes by logic like "Fetch Bar using a foreign key to Bar's unique (non-PK) field. After having a Bar, read that field." But that seems counterintuitive and unneccessary to me, as we already had the value to access Bar to begin with!

  1. Doing a minimal reproduction of the problem: https://github.com/golddranks/drf-demo

This helped me to notice what was relevant about the problem and what was not. Playing with the reproduction also made me notice that the problem was not just about the Serializer, but just generally accessing the bar_code field. I realized that I didn't understand how to access the underlying value of that field, without triggering fetching the whole referenced model. By playing around, I also discovered that when bar_code is the primary key of Bar, it is actually smart enough to not load Bar, but just access the value of Foo's foreign key value directly!

  1. Asking around with the newfounded knowledge.

A friend send me a pull request that shows how to access the underlying value of the foreign key field: accessing the field as bar_code_id, with a suffix _id attached. What was surprising to me, was that the DB table doesn't have such a column! From what I gathered from the documentation (https://docs.djangoproject.com/en/dev/ref/models/fields/#database-representation), I had thought that accessing the underlying field in such way was possible in the case Django automatically created an underlying ID field, but in this case it didn't, we manually created it, and it certainly doesn't have the suffix _id! However bar_code_id works and manages to fetch the underlying value without triggering loading Bar!

  • Related