I have read that Columnar databases are apt for Aggregate Queries and Cassandra is a columnar database. I am trying to use count( values 'between' or '>=' for a specific partition) in Cassandra. Is this performance intensive?
CodePudding user response:
Cassandra is a partitioned row store. Data is stored in partitions, clustered together and served as "rows." It is not a columnar database.
An aggregate query to run a count will not perform well on Cassandra. To attempt it will be performance intensive, right up until the coordinator node times-out the query.
If this is a use case you need to solve for, another database will be the better option.
CodePudding user response:
Adding to @aaron's response, if you're performing an aggregate operation just within your partition, that might be okay. For example,
Let's assume your table schema is as follows:
CREATE TABLE IF NOT EXISTS keyspace_name.table_name (
partition_key1 some_type,
partition_key2 some_type,
clustering_key1 some_type,
clustering_key2 some_other_type,
regular_column1 some_type,
...
regular_columnN some_type,
PRIMARY KEY ((partition_key1, partition_key2), clustering_key1, clustering_key2)
) WITH CLUSTERING ORDER BY(ck1 DESC, ck2 DESC)
AND ...;
it may be okay to do aggregation queries such as the following to be performant,
SELECT COUNT(some_regular_column) FROM keyspace_name.table_name WHERE partition_key1 = ? AND partition_key2 = ? AND clustering_key1 >= ? AND clustering_key2 <= ?;