Django - Storing user-defined querying logic in a model-CodePudding

I'm making a school app with DRF. Teachers are able to create Exercises and associate them with Tags. An Exercise is thus in m2m relationship with Tag:

class Exercise(models.Model):
    MULTIPLE_CHOICE_SINGLE_POSSIBLE = 0
    MULTIPLE_CHOICE_MULTIPLE_POSSIBLE = 1
    OPEN_ANSWER = 2
    JS = 3
    C = 4

    EXERCISE_TYPES = (
        # ...
    )

    DRAFT = 0
    PRIVATE = 1
    PUBLIC = 2

    EXERCISE_STATES = (
        # ...
    )
    
    exercise_type = models.PositiveSmallIntegerField(choices=EXERCISE_TYPES)
    state = models.PositiveSmallIntegerField(choices=EXERCISE_STATES, default=DRAFT)
    text = models.TextField(blank=True)
    tags = models.ManyToManyField(Tag, blank=True)


class Tag(models.Model):
    name = models.TextField()

Teachers can use tags to create quizzes that randomly select exercises with certain tags. In order to do this, there's a Rule model, to which one or more Clauses are related. A Clause is in a m2m relationship with Tag.

Let's say a Rule has two Clauses associated; the first clause is associated to tag t1 and t2, and the second clause is associated to t3 and t4.

The Rule logic will pick an exercise that has these tags: (t1 OR t2) AND (t3 or t4)

class EventTemplateRule(models.Model):
    pass

class EventTemplateRuleClause(models.Model):
    rule = models.ForeignKey(
        EventTemplateRule,
        related_name="clauses",
        on_delete=models.CASCADE,
    )
    tags = models.ManyToManyField(Tag, blank=True)

The actual query is constructed at runtime using repated filter applications and Q objects.

This allows teachers to state conditions like:

pick an exercise with tag "binary search tree" and tag "easy"
pick an exercise with tag "graphs" and tag "DAG"
pick an exercise with tag "hard" and either "tree" or "linked list"
...

At the moment, I'm looking to make this system more expressive. It'd be nice to have more search parameters available, for example:

pick an exercise which has either tag "DAG" or "graph", AND has tag "easy", AND whose state isn't PRIVATE, AND whose text contains Let A be ... OR whose exercise_type is OPEN_ANSWER.

You get the idea: arbitrary AND'd/OR'd conditions on any of the exercise's fields and relations.

My question is: how would you store such a condition in a Rule model?

The first thing I thought is to use a JSONField to store the condition in a way similar to this:

{
    "and": [
         "or": [
            "has_tag": "1", // PK of the tag
            "has_tag": "22"
         ],
        "or": [
             "is_state": "2",
             "text_contains": "Let A be *", // valid regex
        ]
}

One issue I see with this is referential integrity: there is no way to check at the DBMS level that tags 1 and 22 actually exist. Same for valid values for fields like state. I can certainly check those things in a validator, but it feels a little hacky.

Another possible concern is safety: could user engineer conditions as to somehow inject arbitrary SQL in the resulting queries? This shouldn't happen if I construct the query using the ORM as opposed to generating my own raw SQL, but I'm asking just in case.

Is there a better way to go about this?

CodePudding user response：

I don't think JSON is a bad way of storing this, because it seems there's a lot of possible configurations to handle. Also, looking at the raw JSON is more immediately obvious what is going on that using models.

However, an alternative that is more django-aligned would be to use a RuleCriteria like model and formsets. I've put together what a model might look like, you'll need to give it far more thought than I have though:

class RuleCriteria(models.Model):
   rule = models.ForeignKey(Rule, ... # link to Rule
   parent_criteria = models.ForeignKey("self", ... # connect to the criteria before

   join = models.CharField(... # AND or OR or null

   comparison = models.CharField(... # your has_tag, is_state, etc.
   value = models.CharField(... # your '1', '22', etc.

You can then use formsets to manage how these are created and edited. The validation can also sit on the model when it's cleaned.

As to your other point about injection, this isn't a risk if you are still using the ORM to process the data that the end user has put together. It would be a risk if you were to use raw SQL and not sanitise the inputs.

CodePudding user response：

I don't see a better way than the way you described it, but then i don't really think its that much of an issue if properly unit-tested. Pre-loading all valid IDs, filters etc. once at application startup and then checking validity on-the-fly is simple and will not require extra database queries, so i don't see that much of an issue with it.