I'm using Elassandra 6.2.3 I have set a cluster of 3 nodes and created a keyspace with replication factor of 2.
I'm using Murmur3Partitioner
and num_tokens=8
in Cassandra configuration.
CREATE KEYSPACE mykeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '2'} AND durable_writes = true;
DESC mykeyspace;
CREATE TABLE mykeyspace.mytable(
f1 text,
f2 timestamp,
f3 text,
f4 text,
f5 int,
PRIMARY KEY (f1, f2)
) WITH CLUSTERING ORDER BY (f2 DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE CUSTOM INDEX elastic_mytable_idx ON mykeyspace.mytable () USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
here the ring:
$ nodetool ring mykeyspace
Datacenter: DC1
==========
Address Rack Status State Load Owns Token
8046893569169204194
10.8.0.6 r1 Up Normal 30.77 GiB 86.15% -7885825037800730315
10.8.0.1 r1 Up Normal 17.55 GiB 64.24% -7261086042602187969
10.8.0.14 r1 Up Normal 7.36 GiB 49.61% -7247943966600989463
10.8.0.1 r1 Up Normal 17.55 GiB 64.24% -7228717159480176131
10.8.0.1 r1 Up Normal 17.55 GiB 64.24% -6939207504674930480
10.8.0.6 r1 Up Normal 30.77 GiB 86.15% -6158757762234956967
10.8.0.14 r1 Up Normal 7.36 GiB 49.61% -4699623277895141955
10.8.0.14 r1 Up Normal 7.36 GiB 49.61% -4269715227726417275
10.8.0.6 r1 Up Normal 30.77 GiB 86.15% -3148156422280710025
10.8.0.14 r1 Up Normal 7.36 GiB 49.61% -2567971232125784764
10.8.0.14 r1 Up Normal 7.36 GiB 49.61% -2187229040967677675
10.8.0.6 r1 Up Normal 30.77 GiB 86.15% -2058807466377445130
10.8.0.1 r1 Up Normal 17.55 GiB 64.24% -1181919914747129817
10.8.0.6 r1 Up Normal 30.77 GiB 86.15% 695306942662545127
10.8.0.1 r1 Up Normal 17.55 GiB 64.24% 1989050017548537421
10.8.0.14 r1 Up Normal 7.36 GiB 49.61% 2881433693910708029
10.8.0.6 r1 Up Normal 30.77 GiB 86.15% 3454959670543032324
10.8.0.14 r1 Up Normal 7.36 GiB 49.61% 3833350227892101457
10.8.0.6 r1 Up Normal 30.77 GiB 86.15% 4855735318033934682
10.8.0.6 r1 Up Normal 30.77 GiB 86.15% 6288034337780481749
10.8.0.1 r1 Up Normal 17.55 GiB 64.24% 6495870875989416002
10.8.0.1 r1 Up Normal 17.55 GiB 64.24% 6853344637592889364
10.8.0.14 r1 Up Normal 7.36 GiB 49.61% 6911496393497851249
10.8.0.1 r1 Up Normal 17.55 GiB 64.24% 8046893569169204194
I have created a testing program that generates 1M of random data and send them to Cassandra via nodejs's cassandra-driver library. The testing program generates data with roughly 2900 different partition keys (f1) and different clustering keys (f2).
The result is that data are distributed like this:
$ nodetool status mykeyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.8.0.14 12.47 GiB 8 49.6% c5a17dbd-40ef-4f58-b132-0d977a92f1a1 r1
UN 10.8.0.1 17.55 GiB 8 64.2% f088d009-bd97-4e35-9f20-60006a68b363 r1
UN 10.8.0.6 33.49 GiB 8 86.1% e82191ad-9d9f-459f-9da0-2b0457ad6611 r1
Why does one node have almost the double of the load of the other 2 nodes?
Thanks
CodePudding user response:
In Cassandra, vnodes tokens are allocated randomly, therefore the resulting token ranges may differ in size.
In practice, this effect is mitigated by the number of tokens being high.
If you want a better distribution, you could set the num_token
to 256.
But there is a drawback, setting a high number of tokens will increase the complexity of search requests in Elassandra, because the token range is used to filter out duplicate results. See Elassandra search path documentation for more information.
I would recommend not settings a num_token
higher than 16, and ideally to benchmark different settings to fit your specific needs.