Context
I am quite new to Django and I am trying to write a complex query that I think would be easily writable in raw SQL, but for which I am struggling using the ORM.
Models
I have several models named SignalValue
, SignalCategory
, SignalSubcategory
, SignalType
, SignalSubtype
that have the same structure like the following model:
class MyModel(models.Model):
id = models.BigAutoField(primary_key=True)
name = models.CharField()
fullname = models.CharField()
I also have explicit models that represent the relationships between the model SignalValue
and the other models SignalCategory
, SignalSubcategory
, SignalType
, SignalSubtype
. Each of these relationships are named SignalValueCategory
, SignalValueSubcategory
, SignalValueType
, SignalValueSubtype
respectively. Below is the SignalValueCategory
model as an example:
class SignalValueCategory(models.Model):
signal_value = models.OneToOneField(SignalValue)
signal_category = models.ForeignKey(SignalCategory)
Finally, I also have the two following models. ResultSignal
stores all the signals related to the model Result
:
class Result(models.Model):
pass
class ResultSignal(models.Model):
id = models.BigAutoField(primary_key=True)
result = models.ForeignKey(
Result
)
signal_value = models.ForeignKey(
SignalValue
)
Query
What I am trying to achieve is the following.
For a given Result
, I want to retrieve all the ResultSignal
s that belong to it, filter them to keep the ones of my interest, and annotate them with two fields that we will call filter_group_id
and filter_group_name
. The values of two fields are determined by the SignalValue
of the given ResultSignal
.
From my perspective, the easiest way to achieve this would be first to annotate the SignalValue
s with their corresponding filter_group_name
and filter_group_id
, and then to join the resulting QuerySet
with the ResultSignal
s. However, I think that it is not possible to join two QuerySet
s together in Django. Consequently, I thought that we could maybe use Prefetch
objects to achieve what I am trying to do, but it seems that I am unable to make it work properly.
Code
I will now describe the current state of my queries.
First, annotating the SignalValue
s with their corresponding filter_group_name
and filter_group_id
. Note that filter_aggregator
in the following code is just a complex filter that allows me to select the wanted SignalValue
s only. group_filter
is the same filter but as a list of subfilters. Additionally, filter_name_case
is a conditional expression (Case()
construct):
# Attribute a group_filter_id and group_filter_name for each signal
signal_filters = SignalValue.objects.filter(
filter_aggregator
).annotate(
filter_group_id=Window(
expression=DenseRank(),
order_by=group_filters
),
filter_group_name=filter_name_case
)
Then, trying to join/annotate the SignalResult
s:
prefetch_object = Prefetch(
lookup="signal_value",
queryset=signal_filters,
to_attr="test"
)
result_signals: QuerySet = (
last_interview_result
.resultsignal_set
.filter(signal_value__in=signal_values_of_interest)
.select_related(
'signal_value__signalvaluecategory__signal_category',
'signal_value__signalvaluesubcategory__signal_subcategory',
'signal_value__signalvaluetype__signal_type',
'signal_value__signalvaluesubtype__signal_subtype',
)
.prefetch_related(
prefetch_object
)
.values(
"signal_value",
"test",
category=F('signal_value__signalvaluecategory__signal_category__name'),
subcategory=F('signal_value__signalvaluesubcategory__signal_subcategory__name'),
type=F('signal_value__signalvaluetype__signal_type__name'),
subtype=F('signal_value__signalvaluesubtype__signal_subtype__name'),
)
)
Normally, from my understanding, the resulting QuerySet
should have a field "test" that is now available, that would contain the fields of signal_filter
, the first QuerySet
. However, Django complains that "test"
is not found when calling .values(...)
in the last part of my code: Cannot resolve keyword 'test' into field. Choices are: [...]
. It is like the to_attr
parameter of the Prefetch
object was not taken into account at all.
Questions
- Did I missunderstand the functioning of
annotate()
andprefetch_related()
functions? If not, what am I doing wrong in my code for the specified parameterto_attr
to not exist in my resultingQuerySet
? - Is there a better way to join two
QuerySet
s in Django or am I better off using RawSQL? An alternative way would be to switch to Pandas to make the join in-memory, but it is very often more efficient to do such transformations on the SQL side with well-designed queries.
CodePudding user response:
You're on the right path, but just missing what prefetch does.
Your annotations are correct, but the "test" prefetch isn't really an attribute. You batch up the
SELECT * FROM signal_value
queries so you don't have to execute the select per row. Just drop the "test" annotation and you should be fine. https://docs.djangoproject.com/en/3.2/ref/models/querysets/#prefetch-relatedPlease don't use pandas, it's definitely not necessary and is a ton of overhead. As you say yourself, it's more efficient to do the transforms on the sql side
CodePudding user response:
From the docs on prefetch_related
:
Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.
It's not obvious but the values()
call is part of these chained methods that imply a different query, and will actually cancel prefetch_related
. This should work if you remove it.