Home > Enterprise >  Best practice to limit DB text field just to prevent harm
Best practice to limit DB text field just to prevent harm

Time:02-22

There are some blog-articles in project I work, and, I guess, its text field should be limited somehow (probably it would be a JSON field one day)

There are no limitations in sense of domain - user can write as much as he wants, but, just to prevent DB harm by hack-attacks with uncommonly huge amounts of text, guess, some limit is needed.

As SO Q/A says:

  1. PostgreSQL limit text field with 1GB Is there a maximum length when storing into PostgreSQL TEXT
  2. http POST limits depend on browser (2GB - 4GB) https://serverfault.com/questions/151090/
  3. By rumors, Nginx's default client_max_body_size is 1MB

So, how to deal with all of that?

Probably, there are some practice like: "Just limit it with million chars in app-lvl and don't worry"?

CodePudding user response:

This is an interesting question. If I understand correctly you are working on an application where the DB entry will essentially be a blog post (typically 500-1000 words based on most blogs I read).

You are storing the blog post as a text field in your database. You are quite reasonably worried about what happens with large blobs of data.

I fully advocate for you having a limit on the amount of data the user can enter. Without fully understanding the architecture of your system it's impossible to say what is the theoretical max size based on the technologies used.

However, it is better to look at this from a user perspective. What is the maximum reasonable amount of text for you to have to store, then let's add a bit more say 10% because let's face it users will do the unexpected. You can then add an error condition when someone tried to enter more data.

My reason for proposing this approach is a simple one, once you have defined a maximum post size you can use boundary value analysis (testing just either side of the limit) to prove that your product behaves correctly both just below and at the limit. This way you will know and can explain the product behaviour to the users etc.

If you choose to let the architecture define the limit then you will have undefined behaviour. You will need to analyze each component in turn to work out the maximum size they will accept and the behaviour they exhibit when that size is exceeded.

Typically (in my experience) developers don't put this effort in and let the users do the testing for them. This of course is generally worse because the user reports a strange error message and then the debugging is ultimately time consuming and expensive.

  • Related