- A migration handles creating two new fields
action_duplicate
andstatus_duplicate
- The second migration copies the data from the
action
andstatus
fields, to the two newly created fields
def remove_foreign_keys_from_user_request(apps, schema_editor):
UserRequests = apps.get_model("users", "UserRequest")
for request_initiated in UserRequest.objects.all().select_related("action", "status"):
request_initiated.action_duplicate = request_initiated.action.name
request_initiated.status_duplicate = request_initiated.status.name
request_initiated.save()
- The third migration is suppose to remove/delete the old fields
action
andstatus
- The fourth migration should rename the new duplicate fields to the old deleted fields
The solution here is to remove the dependency on the status and action, to avoid unnecessary data base query, since the status especially will only be
pending
andcompleted
My question is for the second migration. The number of records are between 300k to 600k records so I need to know a more efficient way to do this so it doesn't take up all the memory available.
Note: The Database is MySQL
.
A trimmed-down version of the UserRequest
model
class UserRequest(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
reference = models.CharField(max_length=50, null=True, blank=True)
requester = models.ForeignKey(User, blank=True, null=True, on_delete=models.CASCADE)
action = models.ForeignKey(Action, on_delete=models.CASCADE)
action_duplicate = models.CharField(
max_length=50, choices=((ACTION_A, ACTION_A), (ACTION_B, ACTION_B)), default=ACTION_A
)
status = models.ForeignKey(ProcessingStatus, on_delete=models.CASCADE)
status_duplicate = models.CharField(
max_length=50,
choices=((PENDING, PENDING), (PROCESSED, PROCESSED)),
default=PENDING,
)
CodePudding user response:
You can work with a Subquery
expression [Django-doc], and do the update in bulk:
def remove_foreign_keys_from_user_request(apps, schema_editor):
UserRequests = apps.get_model('users', 'UserRequests')
Action = apps.get_user('users', 'Action')
Status = apps.get_user('users', 'ProcessingStatus')
UserRequests.objects.update(
action_duplicate=Subquery(
Action.objects.filter(
pk=OuterRef('action_id')
).values('name')[:1]
),
status_duplicate=Subquery(
Status.objects.filter(
pk=OuterRef('status_id')
).values('name')[:1]
)
)
That being said, it looks that what you are doing is actually the opposite of database normalization [wiki]: usually if there is duplicated data, you make an extra model where you make one Action
/Status
per value, and thus prevent having the same value for action_duplicate
/status_duplicate
multiple times in the database: this will make the database larger, and harder to maintain.
Note: normally a Django model is given a singular name, so
UserRequest
instead of.UserRequests