I have a MySQL/MariaDB database where posts are stored. Each post has some statistical counters such as the number of times the post has been viewed for the current day, the total number of views, number of likes and dislikes.
For now, I plan to have all of the counter columns updated in real-time every time an action happens - a post gets a view, a like or a dislike. That means that the post_stats
table will get updated all the time while the posts
table will rarely be updated and will only be read most of the time.
The table schema is as follows:
posts(post_id, author_id, title, slug, content, created_at, updated_at)
post_stats(post_id, total_views, total_views_today, total_likes, total_dislikes)
The two tables are connected with a post_id
foreign key. Currently, both tables use InnoDB. The data from both tables will be always queried together to be able to show a post with its counters, so this means there will be an INNER JOIN
used all the time. The stats are updated right after reading them (every page view).
My questions are:
- For best performance when the tables grow, should I combine the two tables into one since the columns in
post_status
are directly related to thepost
entries, or should I keep the counter/summary table separate from the mainposts
table? - For best performance when the tables grow, should I use MyISAM for the
posts
table as I can imagine that MyISAM can be more efficient at reads while InnoDB at inserts?
This problem is general for this database and also applies to other tables in the same database such as users
(counters such as the total number views of their posts, the total number of comments written by them, the total number of posts written by them, etc.) and categories
(the number of posts in that category, etc.).
Edit 1: The views per day counters are reset once daily at midnight with a cron job.
Edit 2: One reason for having posts
and post_stats
as two tables is concerns about caching.
CodePudding user response:
For low traffic, KISS -- Keep the counters in the main
post
table. (I assume you have ruled this out.)For high traffic, keep the counters in a separate table. But let's do the "today's" counters differently. (This is what you want to discuss.)
For very high traffic, gather up counts so that you can do less than 1 Update per click/view/like. ("Summary Tables" is beyond the scope of this question.)
Let's study total_views_today
. Do you have to do a big "reset" every midnight? That is (or will become) too costly, so let's try to avoid it.
- Have only
total_views
in the table. - At midnight copy the table into another table. (
SELECT
is faster and less-invasive than theUPDATE
needed to reset the values.) Do this copy by building a new table, thenRENAME TABLE
to move it into place. - Compute
total_views_today
by subtracting the corresponding values in the two tables.
That left you with
post_stats(post_id, total_views, total_likes, total_dislikes)
For "high traffic, it is fine to do
UPDATE post_stats SET ... = ... 1 WHERE post_id = ...;
at the moment needed (for each counter).
But there is a potential problem. You can't increment a counter if the row does not exist. That would be best solved by creating a row with zeros at the same time the post
is created. (Otherwise, see IODKU.)
(I may come back if I think of more.)