I have a large data set (100,000 users) that needs to be cached and sorted for use at all times during a game's runtime. The system is of users in a game that have the ability to earn points. The system is broken up into parts for completing game tasks as follows:
Category Score: Positive integer score earned from completing tasks in a category of the game.
Category Rank: Rank (0 -> n) of users sorted by top score in a category.
Category Worth: The worth for the category rank in a given category. I.E: "Fighting" rank 0 gives you 10 global points while "Fighting" rank 1 give you 9 global points.
Global Score: A summation of points earned by each user depending on the category rank and score. I.E: If you earn a 0th place in fighting (worth 10 points), but 2nd place in defense (worth 8 points), your global score is 18th points.
Global Rank: Rank (0 -> n) sorted by global score.
In the MySQL table keeping this data, each entry is a kept in a row of (player [UUID], category [String], score [Integer])
. However, since data is modified every second based on user gameplay and the user needs to be able to access the global and individual category leaderboards at any time, this information needs to be cached in memory and pre-sorted to remain efficient. As an additional caveat, both category order and global order need to be kept available as separate data points considering a player may want to view an individual category rank and score as well as their global rank and score.
Currently, the system achieves this with the following steps:
- Get all entries from MySQL and log them into individual
User
objects with just the raw score for each category - Put all users into a category cache
(HashMap<Category, ArrayList<User>>)
with the list being sorted by their category score (this will be used for individual category leaderboards) - Loop through each category in the aforementioned category cache and put that user into a global cache
(ArrayList<User>)
with points summed. This is then sorted by total points which gives them their rank based on their index in the list.
The current means of sorting are by using Java 8's built in List#sort()
with comparators using the category's score and the global score respectively.
Currently, the sorting of this data with 100,000 players and 25 categories is around 40-60 seconds with a massive amount of memory and CPU power being used to sort each pass through. This refreshes every 10 minutes and is severely impacting server performance.
Any idea on how to re-work this system to alleviate some of the pressure on the server?
Edit: this post shows that for larger data sets, database sorting is more efficient, but I it doesn't fully cover how the ranking would be approached considering ranks are not based by one column, but rather a 25 calculations done based on the rank.
CodePudding user response:
I would recommend creating a database index kept in an ordered data structure with a command like this:
create index category_order using btree
on your_table (category, score DESC, player);
And now run the following query:
select score, player
from your_table
where category = 'your category'
order by score DESC;
MySQL should be able to answer this query by scanning the index and returning data in order. Since the necessary data is already kept in order, this should be very fast compared to the current approach.
Unfortunately MySQL can't tell you ranks as well as ordering. So you have to actually run the full query periodically to store ranks. This doesn't matter for showing a live top 10 result. But it won't show person 10,000 their rank improvement right after they do something.
For that you can cheat. You can store a score -> rank mapping in a table. When someone changes their score, you use their new score to estimate their new rank, then show them a page of data around that rank. As long as you update the mapping regularly, most users won't notice the cheat. But it is a cheat.
Can we not cheat? Of course! But it will take effort.
There is no reason in principle why a sorted data structure (eg red-black tree, btree) can't also calculate ranks on the fly. But in practice it requires keeping track of how many elements are below the node at each node. Since that is not often asked for, standard implementations don't include that.
But you can write your own. For example here is a Red-Black tree that supports rank. (Careful of copyright, either write your own or ask for permission to use that.) So if you keep a category in such a data structure, and also keep track of users and their values, you should be able to take a user, look at their score, look them up in the tree, find their rank, and (with a little cleverness) ALSO return a page of data of people around any rank you want. This data structure should be able to handle tens of thousands of requests and/or modifications per second on a single CPU. Replicate multiple copies of it in a well-designed service and people can always get live results, with only millisecond delays. And without any compromises.
If you need help beyond simply knowing that it is possible, well, you can contact me at my user name @gmail.com to discuss a consulting rate...