Home > Back-end >  which table collation takes least space and least table size?
which table collation takes least space and least table size?

Time:05-03

I'm new to MySQL. In a classified ads management database I've a table called "expired".
I've selected its data type as Boolean which is in fact tinyint(1).
this field stores whether a posted ads (advertisement) is expired or not.
since this field will store only these two digits: 0 and 1, now question is which collation will takes least space for this table?
current table collation is utf8-general_ci.
In general I need to know whether there is a relation between table collation and table size?
I googled "database collation effect on database size" but didn't get any related and useful result!

CodePudding user response:

"Character set" is the encoding of characters; it is probably the term you wanted.

"Collation" controls how characters are compared; it does not involve space.

"Numbers" do not involve either "character sets" nor "collations".

The typical way to store a yes/no (true/false) value is in a TINYINT, which takes 1 byte. The "(1)" after your tinyint example provides no information and will soon be removed from the syntax.

If you have lots of true/false values, consider using the SET datatype. It can pack up to 64 values in up to 8 bytes.

"ascii" and "latin1" are single-byte characters. latin1 can handle a limited number of accented letters -- as would be found in Western European languages.

"utf8mb4" should be used for general character support. It still takes 1 byte per English letter, 1 or 2 bytes for the rest of Europe, and 3-4 bytes per Chinese character.

For a single boolean, there is nothing smaller than tinyint. For a small number of booleans, I recommend one tinyint each -- it is a tradeoff of a small amount of space versus complexity in the code. For a large number of booleans, SET, BINARY(n) (n is bytes, not bits)or some size ofINT` plus masking operations.

For an app that is limited to Western European languages, latin1 is handy and slightly more compact for strings. Beyond that, use utf8mb4.

Note each "character set" has a variety of "collations"; the default collation is usually appropriate.

A Rule of Thumb that I like to apply: If I can't see at least 10% improvement (in space or speed or whatever), move on. That is, look for something else that might give more improvement. (Andy's Comment is another way of saying my RoT.)

  • Related