I created a function to average the comments in TVAndMovie. However, when I created 100000 comments, it took an average of 4 seconds to get the average of the scores! I have been trying to find a way to get the average of the scores. How can I reduce the time to less than 0.01 seconds?
class TVAndMovie(models.Model):
tmdb_id = models.IntegerField(
verbose_name="",
blank=False,
null=False,
)
judge_tv_or_movie = models.CharField(
blank=False, null=False, default="movie", max_length=20
)
stars = models.FloatField(
blank=False,
null=False,
default=0,
validators=[MinValueValidator(0.0), MaxValueValidator(10.0)],
)
def get_comments(self) -> object:
return Comment.objects.filter(
tv_or_movie_id=self.id
)
def average_stars(self) -> float:
comments = self.get_comments()
n_comments = comments.count()
if n_comments:
self.stars = round(
sum([comment.stars for comment in comments]) / n_comments, 3
)
else:
self.stars = 0
self.save()
return self.stars
class Comment(models.Model):
comment = models.TextField(max_length=1000)
stars = models.FloatField(
blank=False,
null=False,
default=0,
validators=[MinValueValidator(0.0), MaxValueValidator(10.0)],
)
user = models.ForeignKey(CustomUser, on_delete=models.CASCADE)
tv_or_movie = models.ForeignKey(TVAndMovie, on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class Meta:
unique_together = ("user", "tv_or_movie")
indexes = [models.Index(fields=["user", "tv_or_movie"])]
CodePudding user response:
So I decided to debug your code and measure performance - since I use faster computer (for 100k records it executes the average_stars
method in less than seconds) I created 500k Comments
. Then this method executes in around 4.5s.
I measured the performance using debug view:
class IndexView(TemplateView):
def get(self, *args, **kwargs):
tv_or_movie = TVAndMovie.objects.first()
start = time.time()
stars = tv_or_movie.average_stars()
end = time.time()
return JsonResponse({
'finish': end-start,
'rate': stars,
})
Then, I tried to write optimized code - instead of calling direct query I decided to use _set
directive and then cast stars
values to list
using values_list
method.
This way I got 0.2s execution time (performance). Code:
def average_stars(self) -> float:
get_stars = list(self.comment_set.values_list('stars', flat=True))
if get_stars:
self.stars = round(
sum(get_stars) / len(get_stars), 3
)
self.save()
return self.stars
Generally using .count()
is not bad, but len()
is faster, and you can execute it on list
that you got while caching stars
values to said list
.
Also I think this is the fastest AND simplest way possible to calculate average rating, you will probably not get less than 0.01s performance, but I think it's enough for you.
If you have any questions reply to me in comments.
--- UPDATE ---
I forgot that Django also supports Sum
in Django ORM. So basically you can simply calculate the average using this logic:
def average_stars(self) -> float:
count = self.comment_set.count()
if count:
get_sum = self.comment_set.aggregate(Sum('stars'))
self.stars = round(
get_sum['stars__sum'] / count, 3
)
self.save()
return self.stars
Which is double time faster, executing in around 0.1s.