In Datastore in Firestore mode the recommended way to deal with storing a high write counter (such as profile views on a website) is to use sharded/distributed counters.
The problem I have is that with distributed counters you need to pick how many shards you want to have. This is addressed here as well. For example some profiles may get a lot more views per second than others (one profile may be a famous person while another is a regular person), and therefore need more shards.
Is there a way to write a distributed counter that can scale it's shards up if the page is getting a lot of views per second?
I was thinking of detecting a datastore contention error and then adding more shards if that happens.
I noticed there is a new extension for Cloud Firestore that seems to do what I am asking for. However, I am not using Cloud Firestore, I am using Datastore in Firestore mode - similar under the hood but still different.
CodePudding user response:
The original Datastore distributed counters example:
NUM_SHARDS = 20
class SimpleCounterShard(ndb.Model):
"""Shards for the counter"""
count = ndb.IntegerProperty(default=0)
def get_count():
"""Retrieve the value for a given sharded counter.
Returns:
Integer; the cumulative count of all sharded counters.
"""
total = 0
for counter in SimpleCounterShard.query():
total = counter.count
return total
@ndb.transactional
def increment():
"""Increment the value for a given sharded counter."""
shard_string_index = str(random.randint(0, NUM_SHARDS - 1))
counter = SimpleCounterShard.get_by_id(shard_string_index)
if counter is None:
counter = SimpleCounterShard(id=shard_string_index)
counter.count = 1
counter.put()
Used a fixed number of shards, but the Firestore example uses a separate entity for keeping track of the number of shards. So, you can update the code above with something like:
class RootCounter(ndb.Model):
count = ndb.IntegerProperty(default=0)
num_shards = ndb.IntegerProperty(default=0)
def get_count(self):
if self.num_shards > 0:
return sum([e.count for e in SimpleCounterShard.query(parent=self.key)])
return count
def increment(self):
try:
self._increment()
except:
self.num_shards = 1
self.increment()
self.put()
@ndb.transactional(retries=1):
def _increment(self):
if self.num_shards > 0:
SimpleCounterShard.increment(parent=self.key, self.num_shards)
else:
self.count = 1
self.put()
The important difference since Firestore in Datastore mode has been released is that Firestore in Datastore mode is strongly consistent and that you are likely not using entity groups. Thus a query will give an exact answer, and the sharded counters can nicely fit in the hierarchy with the root counter.