Home > Software design >  Datastore in Firestore mode - a distributed counter than can scale it's shards up based on traf
Datastore in Firestore mode - a distributed counter than can scale it's shards up based on traf

Time:08-28

In Datastore in Firestore mode the recommended way to deal with storing a high write counter (such as profile views on a website) is to use sharded/distributed counters.

The problem I have is that with distributed counters you need to pick how many shards you want to have. This is addressed here as well. For example some profiles may get a lot more views per second than others (one profile may be a famous person while another is a regular person), and therefore need more shards.

Is there a way to write a distributed counter that can scale it's shards up if the page is getting a lot of views per second?

I was thinking of detecting a datastore contention error and then adding more shards if that happens.

I noticed there is a new extension for Cloud Firestore that seems to do what I am asking for. However, I am not using Cloud Firestore, I am using Datastore in Firestore mode - similar under the hood but still different.

CodePudding user response:

The original Datastore distributed counters example:

NUM_SHARDS = 20

class SimpleCounterShard(ndb.Model):
    """Shards for the counter"""
    count = ndb.IntegerProperty(default=0)


def get_count():
    """Retrieve the value for a given sharded counter.

    Returns:
        Integer; the cumulative count of all sharded counters.
    """
    total = 0
    for counter in SimpleCounterShard.query():
        total  = counter.count
    return total


@ndb.transactional
def increment():
    """Increment the value for a given sharded counter."""
    shard_string_index = str(random.randint(0, NUM_SHARDS - 1))
    counter = SimpleCounterShard.get_by_id(shard_string_index)
    if counter is None:
        counter = SimpleCounterShard(id=shard_string_index)
    counter.count  = 1
    counter.put()

Used a fixed number of shards, but the Firestore example uses a separate entity for keeping track of the number of shards. So, you can update the code above with something like:

class RootCounter(ndb.Model):
  count = ndb.IntegerProperty(default=0)
  num_shards = ndb.IntegerProperty(default=0)

  def get_count(self):
    if self.num_shards > 0:
      return sum([e.count for e in SimpleCounterShard.query(parent=self.key)])

    return count

  def increment(self):
    try:
      self._increment()
    except:
      self.num_shards  = 1
      self.increment()
      self.put()

  @ndb.transactional(retries=1):
  def _increment(self):
    if self.num_shards > 0:
      SimpleCounterShard.increment(parent=self.key, self.num_shards)
    else:
      self.count  = 1
      self.put()

The important difference since Firestore in Datastore mode has been released is that Firestore in Datastore mode is strongly consistent and that you are likely not using entity groups. Thus a query will give an exact answer, and the sharded counters can nicely fit in the hierarchy with the root counter.

  • Related