Home > database >  Elassandra Search for Replicated Data
Elassandra Search for Replicated Data

Time:10-04

How token_range is decided in Elassandra while distributing the query to node?

What happens when the data is replicated across Elassandra node(s)?

How does the filtering of duplicate results take place?

CodePudding user response:

My understanding is that the queries go around the cluster in a manner similar to what Cassandra otherwise does.

The data replication is not a concern to the Elasticsearch side of things. They create their own tables to create their search information and those tables are replicated through the standard Cassandra mechanism. If you understand how Cassandra replication works, then the Elasticsearch data does the same kind of thing.

The filtering happens because each search node is given a non-overlapping range of tokens to take care of. In other words, one node is asked to return results for 1, 2, 3, the next node for results for 4, 5, 6, and the third node results for 7, 8, 9. Therefore there won't an overlap and no actual filtering takes place.

CodePudding user response:

Elassandra distributes the query to nodes according to the search_strategy_class of the targeted index. There are two strategies : PrimaryFirstSearchStrategy (the default) and RandomSearchStrategy.

Primary first search strategy

Each node is involved in the query, and is responsible to return documents it owns as a primary node. When a node is down, the next replica will be used as a substitute.

Random search strategy

When RF > 1, the full ring can be reached with only a subset of nodes. The random search strategy takes advantage of this by randomly choosing such a subset of nodes to improve search efficiency.

Both strategies add a token_range filter to each sub-queries according the behavior described above. Therefore, the filtering happens locally, not in the coordinator node.

  • Related