We have about 1k topics on a live kafka cluster. At the moment we have 6 brokers (Id's 1,2,3,4,5,6) on 3 Data Centers. Our default replication factor at cluster level is set to 3. Now due to some unavoidable situation, we are loosing one DC(broker id 1 and2). So we have done the partition reassignment where we have reassigned partitions to brokers 3,4,5 and 6. In addition to this for higher fault toleration we want to increase the replication factor to 4 for all the existing topics
Below is a small sample of generated topic partitions. Now the plan here is to keep the existing partition reassignment and just add the missing broker for e.g.
my_topic_1 p0 replicas are [4, 5, 3], and I would like this to be updated to [4, 5, 3, 6]
my_topic_2 p0 replicas are [3, 6, 4], and I would like this to be updated to [3, 6, 4, 5]
my_topic_2 p0 replicas are [6, 4, 5], and I would like this to be updated to [6, 4, 5, 3]
Sample JSON below. I have been trying with combination of grep, sed and jq, so that we can just get the replica list for each partition, e.g.
my_topic_1 p0 replica list is [4, 5, 3] and compare it with a master list (existing brokers in cluster); [3, 4, 5, 6] and append the missing broker to the partition list, So broker 6 is missing from the partition list; hence add 6 so partition list for topic becomes [4, 5, 3, 6]
Appreciate some suggestions
{
"version": 1,
"partitions": [{
"topic": "my_topic_1",
"partition": 0,
"replicas": [4, 5, 3],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 0,
"replicas": [3, 6, 4],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 1,
"replicas": [6, 4, 5],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 2,
"replicas": [4, 5, 3],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 3,
"replicas": [5, 3, 6],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 4,
"replicas": [3, 5, 6],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 5,
"replicas": [6, 3, 4],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 0,
"replicas": [4, 6, 5],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 1,
"replicas": [5, 4, 3],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 2,
"replicas": [3, 5, 6],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 3,
"replicas": [6, 3, 4],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 4,
"replicas": [4, 6, 5],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 5,
"replicas": [5, 4, 3],
"log_dirs": ["any", "any", "any"]
}]
}
CodePudding user response:
I don't know what all of this means, but to add the missing item of [3,4,5,6]
to a given array, just add one that values 18
(which is 3 4 5 6
) minus the current sum (add
) of the array:
jq '.partitions[].replicas |= . [18-add]' file.json
To make it more generic, you can provide the full array as parameter using --argjson
:
jq --argjson full '[3,4,5,6]' '
.partitions[].replicas |= . [($full | add) - add]
' file.json
Or generate the full list from the arrays at hand by making an array of unique
values:
jq '
([.partitions[].replicas[]] | unique | add) as $sum
| .partitions[].replicas |= . [$sum - add]
' file.json