Home > Software engineering >  what's the most efficient way to check for orphans when deleting objects in Django?
what's the most efficient way to check for orphans when deleting objects in Django?

Time:12-21

Say I have a model with a Things table and a table of relationships between the things called ThingRelations. It should not be possible to delete a Thing when there are ThingRelations that point to it, and when there are no more ThingRelations pointing to a given Thing, it should be deleted. I'm currently trying to implement that with signals.post_delete like this:

from django.db import models

class ThingRelation(models.Model):
    first_thing = models.ForeignKey('Thing', on_delete=models.PROTECT)
    second_thing = models.ForeignKey('Thing', on_delete=models.PROTECT)

class Thing(models.Model):
    name = CharField(max_length=260)

@receiver(models.signals.post_delete, sender=ThingRelation)
def check_relation_and_delete(sender, instance, *args, **kwargs):
    for thing_id in [instance.first_thing, instance.second_thing]:
        first_thing_match = ThingRelation.objects.filter(first_thing=thing_id).exists()
        second_thing_match = ThingRelation.objects.filter(second_thing=thing_id).exists()
        if not first_thing_match and not second_thing_match:
            Thing.objects.get(pk=thing_id).delete()

Is this the most efficient way to find and delete orphaned Things? I'm very new to databases in general, but won't filtering the (potentially quite large) Things table four times for every deleted ThingRelation be slow when deleting many objects at once? Is there some kind of SQL or Django functionality that makes it so this code isn't run for every object in a bulk operation?

CodePudding user response:

signals.py is not designed for bulk operations. Also they are often considered an anti-pattern, because of difficulty of traceability them when you trying to debug some logic.

I'd rather advise you to try routine approach here. An example, which might be helpful for you:

def remove_orphaned_things():
    orphaned_things = Thing.objects.filter(
        Q(id__in=ThingRelation.objects.values_list('first_thing_id').filter(
            first_thing_id=OuterRef('pk')
        ) | Q(id__in=ThingRelation.objects.values_list('second_thing_id').filter(
            second_thing_id=OuterRef('pk')
        )
    ).delete()

This function removes all orphaned Thing's. So all that left is we need to call it properly.

The most easy and straightforward way is to organise it to endless while-true routine with some sleep and run it as a daemon. For example:

from datetime import time
from django.core.management import BaseCommand
# Also some import of `remove_orphaned_things`

class Command(BaseCommand):
    def handle(self, *args, **options):
        while True:
            remove_orphaned_things()
            time.sleep(300)

So it would be executed nearly once per 5 minutes. It could be run with something like supervisor (better) or tmux (worse), so you can be sure it's always running.

The better way is to use something like periodic to orchestrate it. Here is an installation guide (you'd also need RabbitMQ) and a little example how to organise it:

@dramatiq.actor(periodic=cron('*/5 * * * *'), max_retries=0)
def remove_orphaned_things():
    # Exact the same code of `remove_orphaned_things`

Good luck!

  • Related