Home > Software design >  How to keep track of "streaks" in the database?
How to keep track of "streaks" in the database?

Time:06-28

In StackOverflow, you can earn a few badges which are based on streaks, and from those you can imagine some other use cases of tracking streaks:

  • Visit the site every day for 30 days.
  • Vote on 30 posts per day, for 7 days in a row.
  • Etc..

How would you keep track of and implement this in any kind of database? (Doesn't really matter which one, SQL, NoSQL, etc..)

The first hard question/problem is, do you do it in cron/background jobs/tasks, or on every change to the database (every page view or vote, for example)?

The second hard problem is, depending on the previous solution, how do you somewhat efficiently query the state of the database to figure out if the conditions are met, and how do you not count previous "object actions" in your new calculation?

So let's focus on the votes one, since that is a little more complicated. Say we are tracking "UTC days".

If we go with the "check on every database change" solution (to part 1), then on next vote, after save to DB, query all votes since past UTC date for that user. If vote count >= 30, create a new "streak" record and do count as 1. Then on next vote (same day), we somehow need to know we already counted it, hmm, so need to modify the approach. So maybe we track "last_vote_id" for the current day, so we have (pseudocode):

table vote_streaks {
  date last_date;
  int last_vote_id;
  int last_date_count;
  int total_streak_count;
}

Then on the next vote / same day, we check when the last_date we tracked the vote (today), and the last_date_count (say we had 30 votes at this point), and the total streak is 1, we know that we should ignore the 31st vote for that day. Then on day 2, last_date doesn't match today, so we reset last_date_count to 0 before we set it to 1. I think something like that might work? I can't quite tell.

If we went with the approach to do a cron/background job, then we would query all votes (limit 30) for each user, sometime after the start of the UTC day. If we can build a streak out of that (that is more straightforward to do), then we can get the streak. But what if some problem occurs in the middle of the job and it cuts out, and has to restart? I can't imagine how to solve this yet, but it seems way more complicated than the other real-time approach.

How is this generally solved? Again, don't need to get into the weeds of the actual SQL/NoSQL tables (unless you'd like, pseudocode is fine). Or instead of a general solution, if you know of a specific solution (like how StackOverflow implements it), that would work too.

CodePudding user response:

The smallest data you need to keep track of here is going to be the date of the most recent action by the user and the current streak count.

When the user done the streaked action, you need to check the date of the most recent action. If the most recent action happens within less than a day but now is already a different day, you increment the streak count; if it's further than two days, you reset the streak count to zero; otherwise you update the most recent action timestamp

When checking for streaks, you need to check for the most recent action timestamp as well. If the timestamp was last updated within the last day, the streak count is valid, otherwise the real streak count is really zero.

Alternatively, you can just simply do a document database and do the full streak calculations based on transaction logs. This will get expensive for long streak users, though it will be the simplest to implement. Depending on how often you expect people to have long streaks, this might or might not be acceptable.

  • Related