Home > Enterprise >  Django: How to "join" two querysets using Prefetch Object?
Django: How to "join" two querysets using Prefetch Object?

Time:09-23

Context

I am quite new to Django and I am trying to write a complex query that I think would be easily writable in raw SQL, but for which I am struggling using the ORM.

Models

I have several models named SignalValue, SignalCategory, SignalSubcategory, SignalType, SignalSubtype that have the same structure like the following model:

class MyModel(models.Model):
    id = models.BigAutoField(primary_key=True)
    name = models.CharField()
    fullname = models.CharField()

I also have explicit models that represent the relationships between the model SignalValue and the other models SignalCategory, SignalSubcategory, SignalType, SignalSubtype. Each of these relationships are named SignalValueCategory, SignalValueSubcategory, SignalValueType, SignalValueSubtype respectively. Below is the SignalValueCategory model as an example:

class SignalValueCategory(models.Model):
    signal_value = models.OneToOneField(SignalValue)
    signal_category = models.ForeignKey(SignalCategory)

Finally, I also have the two following models. ResultSignal stores all the signals related to the model Result:

class Result(models.Model):
    pass


class ResultSignal(models.Model):
    id = models.BigAutoField(primary_key=True)

    result = models.ForeignKey(
        Result
    )
    signal_value = models.ForeignKey(
        SignalValue
    )

Query

What I am trying to achieve is the following. For a given Result, I want to retrieve all the ResultSignals that belong to it, filter them to keep the ones of my interest, and annotate them with two fields that we will call filter_group_id and filter_group_name. The values of two fields are determined by the SignalValue of the given ResultSignal.

From my perspective, the easiest way to achieve this would be first to annotate the SignalValues with their corresponding filter_group_name and filter_group_id, and then to join the resulting QuerySet with the ResultSignals. However, I think that it is not possible to join two QuerySets together in Django. Consequently, I thought that we could maybe use Prefetch objects to achieve what I am trying to do, but it seems that I am unable to make it work properly.

Code

I will now describe the current state of my queries.

First, annotating the SignalValues with their corresponding filter_group_name and filter_group_id. Note that filter_aggregator in the following code is just a complex filter that allows me to select the wanted SignalValues only. group_filter is the same filter but as a list of subfilters. Additionally, filter_name_case is a conditional expression (Case() construct):

# Attribute a group_filter_id and group_filter_name for each signal
signal_filters = SignalValue.objects.filter(
    filter_aggregator
).annotate(
    filter_group_id=Window(
        expression=DenseRank(),
        order_by=group_filters
    ),
    filter_group_name=filter_name_case
)

Then, trying to join/annotate the SignalResults:

prefetch_object = Prefetch(
    lookup="signal_value",
    queryset=signal_filters,
    to_attr="test"
 )

result_signals: QuerySet = (
    last_interview_result
        .resultsignal_set
        .filter(signal_value__in=signal_values_of_interest)
        .select_related(
            'signal_value__signalvaluecategory__signal_category', 
            'signal_value__signalvaluesubcategory__signal_subcategory',
            'signal_value__signalvaluetype__signal_type',
            'signal_value__signalvaluesubtype__signal_subtype',
        )
        .prefetch_related(
            prefetch_object
        )
        .values(
            "signal_value",
            "test",
            category=F('signal_value__signalvaluecategory__signal_category__name'), 
            subcategory=F('signal_value__signalvaluesubcategory__signal_subcategory__name'),
            type=F('signal_value__signalvaluetype__signal_type__name'),
            subtype=F('signal_value__signalvaluesubtype__signal_subtype__name'),
        )
)

Normally, from my understanding, the resulting QuerySet should have a field "test" that is now available, that would contain the fields of signal_filter, the first QuerySet. However, Django complains that "test" is not found when calling .values(...) in the last part of my code: Cannot resolve keyword 'test' into field. Choices are: [...]. It is like the to_attr parameter of the Prefetch object was not taken into account at all.

Questions

  1. Did I missunderstand the functioning of annotate() and prefetch_related() functions? If not, what am I doing wrong in my code for the specified parameter to_attr to not exist in my resulting QuerySet?
  2. Is there a better way to join two QuerySets in Django or am I better off using RawSQL? An alternative way would be to switch to Pandas to make the join in-memory, but it is very often more efficient to do such transformations on the SQL side with well-designed queries.

CodePudding user response:

You're on the right path, but just missing what prefetch does.

  1. Your annotations are correct, but the "test" prefetch isn't really an attribute. You batch up the SELECT * FROM signal_value queries so you don't have to execute the select per row. Just drop the "test" annotation and you should be fine. https://docs.djangoproject.com/en/3.2/ref/models/querysets/#prefetch-related

  2. Please don't use pandas, it's definitely not necessary and is a ton of overhead. As you say yourself, it's more efficient to do the transforms on the sql side

CodePudding user response:

From the docs on prefetch_related:

Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.

It's not obvious but the values() call is part of these chained methods that imply a different query, and will actually cancel prefetch_related. This should work if you remove it.

  • Related