So I would like to create a database to store a lot of information (100mb every year I'll say). I know that every year I will add a lot of rows of information in the database.
so I am asking if the most optimize thing is to insert all thoses lines into the same table and add the column year
at the end to "archive" the data (because I need to save the year).
exemple of what it would look like :
info 1 | info 2 | info 3 | info 4 | 2021
info 5 | info 6 | info 7 | info 8 | 2021
info 9 | info 10 | info 11 | info 12 | 2021
.
.
info 1000 | info 1001 | info 1002 | info 1003 | 2022
info 1004 | info 1005 | info 1005 | info 1006 | 2022
OR is it to create each year a new table with the rows of the year (so take of the column of the year), it would be something like this :
So for you what is the most optimized?
CodePudding user response:
Are you sure it is 100mb every year , and not a typo for 100gb (which is still relatively small, but ok) or 1 tb? For 100mb each year you could go with whichever solution you prefer, the plans have no measurable differences whatsoever.
CodePudding user response:
When dealing with optimization, your design choices about how to store data should always be thought according to the kind of operations you need to carry out.
Your operations need to be distinguished between:
interactive
, carried out in real-time (e.g. creation of an order by a user)batch
, carried out once in a while (e.g. creation of weekly/monthly/yearly reports about orders)
Your decisions about optimization should give higher weight to interactive
operations as long as they're the most frequent ones.
Now let's get to your example:
- if you have no
interactive
operations that use data across different years, then the second design is better, as long as you'll always deal with just small pieces of the whole data, and in the worst cases you willJOIN
the different tables once in a while. - if you have some
interactive
operations that use data across different years, the first design is better, because you would avoid theJOIN
operation in an interactive (real-time) way.
Side note: the JOIN
operations are the most expensive operations in database manipulation, that's why you want to avoid them if you can.