Home > Software engineering >  Mongodb - hostname/IP changed for all hosts in sharding with replica set configuration
Mongodb - hostname/IP changed for all hosts in sharding with replica set configuration

Time:05-19

Here is my mongo cluster (sharding with replicaset) configuration.

replica sets:
rs0 - IP1, IP2, IP3 || port - 27017
rs1 - IP4, IP5, IP6 || port - 27017

config server replica set - IP7, IP8, IP9 || port - 26017
mongos - IP7, IP8, IP9 || port - 26000

This is a test setup and the configuration was setup using IPs(not hostnames). Unfortunately, all host were down following a maintenance & all host IPs changed when we brought the nodes up. Obviously replica set(mongod), config server(mongod) and mongos didn't come up due to unreachable IP addresses.

To bring up the setup, I did the following

  1. Updated replica set host IP addresses following https://www.mongodb.com/docs/v4.2/tutorial/change-hostnames-in-a-replica-set/
  2. Updated config server replica set host IPs following the same mongo document. Started mongod services w/o sharding.
  3. Didn't find any proper documentation around changing config server & mongos IP address/hostname change. On config server replica set, updated "shards" collection in config db.
cfg1 = db.shards.findOne( { "_id": "rs0" } )
cfg1.host = "rs0/new_IP1:27017,new_IP2:27017,new_IP3:27017"
db.shards.update({ "_id" : "rs0" } , cfg1 )

cfg2 = db.shards.findOne( { "_id": "rs1" } )
cfg2.host = "rs1/new_IP3:27017,new_IP4:27017,new_IP5:27017"
db.shards.update({ "_id" : "rs1" } , cfg2 )
  1. Started config server and mongos properly.
  2. Now restarting replicaset members to make use of shading. However the replica set mongod processes are not starting citing references to old config server replica set IPs. Following error I am getting on mongod.log.
2022-05-17T21:20:39.654 0530 W SHARDING [initandlisten] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: Error loading clusterID :: caused by :: Could not find host matching read preference { mode: "nearest" } for set csrs
2022-05-17T21:20:40.154 0530 I ASIO     [ReplicaSetMonitor-TaskExecutor] Connecting to x.x.x.x:26017
2022-05-17T21:20:41.655 0530 I ASIO     [ReplicaSetMonitor-TaskExecutor] Connecting to y.y.y.y:26017
2022-05-17T21:20:42.660 0530 I ASIO     [ReplicaSetMonitor-TaskExecutor] Failed to connect to z.z.z.z:26017 - HostUnreachable: Error connecting to 10.0.13.206:26017 :: caused by :: No route to host

I couldn't find any help on web to recover from this scenario. Requesting assistance in recovering the setup without loosing any data as we have loaded TBs of data on this cluster.

CodePudding user response:

I run this procedure as test on my local machine. It seems to work, but I cannot guarantee anything.

  • Stop all mongod/mongos services on all nodes

mongod Config ReplicaSet

  • Start one mongod config server in maintenance mode
  • Drop local database
  • Update config.shards
  • Shutdown mongod
  • Delete dbPath of all other config servers
  • Start all mongod config servers
  • Connect to first mongod config server
  • Initiate ReplicaSet

Example (Windows style):

SET MAINTENANCE_LOG=--logpath C:\MongoDB\log\mongo_maintenance.log --logappend
SET MAINTENANCE_NET=--bind_ip localhost --port 55555
SET MAINTENANCE_MISC=--setParameter skipShardingConfigurationChecks=true --setParameter disableLogicalSessionCacheRefresh=true


start mongod --dbpath C:\MongoDB\data\mongocfg_1 %MAINTENANCE_LOG% %MAINTENANCE_MISC% %MAINTENANCE_NET%
mongo --norc localhost:55555/admin 
db.getSiblingDB('local').dropDatabase()
db.getSiblingDB('config').getCollection("shards").updateOne(
   {_id : "shard_01"}, 
   {$set: {host: "shard_01/<new_IP:port>,<new_IP:port>" }}
)
db.getSiblingDB('config').getCollection("shards").updateOne(
   {_id : "shard_02"}, 
   {$set: {host: "shard_02/<new_IP:port>,<new_IP:port>" }}
)
db.getSiblingDB('config').getCollection("shards").updateOne(
   {_id : "shard_03"}, 
   {$set: {host: "shard_03/<new_IP:port>,<new_IP:port>" }}
)
db.getSiblingDB('admin').shutdownServer()
exit

rmdir C:\MongoDB\data\mongocfg_2
rmdir C:\MongoDB\data\mongocfg_3

net start MongoDB_Config_1
net start MongoDB_Config_2
net start MongoDB_Config_3

mongo "mongodb://user:password@localhost:27029/admin?authSource=admin"
rs.initiate(
  {
    _id: "configRepSet",
    configsvr: true,
    members: [
      { _id: 0, host: "<new_IP:port>", priority: 10 },
      { _id: 1, host: "<new_IP:port>", priority: 5 },
      { _id: 2, host: "<new_IP:port>", priority: 5 }
    ]
  }
)
rs.status()
while (! db.hello().isWritablePrimary ) { sleep(1000) }
exit

mongod Shard ReplicaSet

Repeat below for each shard

  • Start one mongod shard server (preferable the former PRIMARY) in maintenance mode
  • Drop local database
  • Update admin.system.version
  • Shutdown mongod
  • Delete dbPath of all other shard servers
  • Start all mongod shard servers
  • Connect to first mongod shard server
  • Initiate ReplicaSet

Example (Windows style):

SET MAINTENANCE_LOG=--logpath C:\MongoDB\log\mongo_maintenance.log --logappend
SET MAINTENANCE_NET=--bind_ip localhost --port 55555
SET MAINTENANCE_MISC=--setParameter skipShardingConfigurationChecks=true --setParameter disableLogicalSessionCacheRefresh=true

start mongod --dbpath C:\MongoDB\data\mongoshard_1prim %MAINTENANCE_LOG% %MAINTENANCE_MISC% %MAINTENANCE_NET%
mongo --norc localhost:55555/admin 
db.getSiblingDB('local').dropDatabase()
db.getSiblingDB('admin').getCollection("system.version").updateOne(
   {_id : "shardIdentity"}, 
   {$set: { configsvrConnectionString: "configRepSet/<new_IP:port>,<new_IP:port>,<new_IP:port>" }}
)
db.getSiblingDB('admin').shutdownServer()
exit

rmdir C:\MongoDB\data\mongoshard_1sec\*
rmdir C:\MongoDB\data\mongoshard_1arb\*

net start MongoDB_Shard_1prim
net start MongoDB_Shard_1sec
net start MongoDB_Shard_1arb


mongo "mongodb://user:password@localhost:37028/admin?authSource=admin"
rs.initiate(
  {
    _id: "shard_01",
    members: [
      { _id: 0, host: "<new_IP:port>", priority: 10 },
      { _id: 1, host: "<new_IP:port>", priority: 5 },
      { _id: 2, host: "<new_IP:port>", arbiterOnly: true }
    ]
  }
)
rs.status()
while (! db.hello().isWritablePrimary ) { sleep(1000) }
exit

mongos Router

This one the the simplest part.

CodePudding user response:

The issue is solved now.

The final piece of puzzle was to find where was the config server connection info saved in replica set mongod. It's in system.version collection under admin db. I had to follow the following steps

  1. Start the mongod on all replicaset members with security authorization, replication and sharding disabled. Made necessary changed on config file.
  2. Under admin db, the following two documents in system.version had the config server connection string.

db.system.version.find( {"_id" : { $in : [ "shardIdentity" , "minOpTimeRecovery" ]} })

  1. Updated both the documents with new config server connection string via db.system.version.update command.
  2. Shut down the mongod processes and enabled security authorization, replication and sharding in the mongod config file.
  3. Successfully started replica set mongod instances.

Note : I am new to mongo and not sure if we should be making changes to internal system collections. Since it was a test setup, I took the risk and did these experiments which paid off. Its not recommended on a production environment a resolution can't be guaranteed.

  • Related