Is cassandra suitable for Aggregate Queries?-CodePudding

I have read that Columnar databases are apt for Aggregate Queries and Cassandra is a columnar database. I am trying to use count( values 'between' or '>=' for a specific partition) in Cassandra. Is this performance intensive?

CodePudding user response：

Cassandra is a partitioned row store. Data is stored in partitions, clustered together and served as "rows." It is not a columnar database.

An aggregate query to run a count will not perform well on Cassandra. To attempt it will be performance intensive, right up until the coordinator node times-out the query.

If this is a use case you need to solve for, another database will be the better option.

CodePudding user response：

Adding to @aaron's response, if you're performing an aggregate operation just within your partition, that might be okay. For example,

Let's assume your table schema is as follows:

CREATE TABLE IF NOT EXISTS keyspace_name.table_name (
 partition_key1 some_type,
 partition_key2 some_type,
 clustering_key1 some_type,
 clustering_key2 some_other_type,
 regular_column1 some_type,
 ...
 regular_columnN some_type,
 PRIMARY KEY ((partition_key1, partition_key2), clustering_key1, clustering_key2)
) WITH CLUSTERING ORDER BY(ck1 DESC, ck2 DESC)
AND ...;

it may be okay to do aggregation queries such as the following to be performant,

SELECT COUNT(some_regular_column) FROM keyspace_name.table_name WHERE partition_key1 = ? AND partition_key2 = ? AND clustering_key1 >= ? AND clustering_key2 <= ?;