Home > OS >  Windows server 2016 hyperv clustered Shared volume not work properly.
Windows server 2016 hyperv clustered Shared volume not work properly.

Time:11-15

Customer production environment is to use the Windows server 2016 hyperv clusters, a total of 7 nodes, installed version is 1607,
Last Monday, the customer in the process of production during the day, suddenly all virtual machine offline, our company engineer immediately rushed to the scene, by connecting to the cluster administrator found that Shared disk volume in the offline state (event ID: 5120 and 5142), manual disk Shared volume to online, but prompt failure, finally USES the restart solution, to restart all the node, in the process of start, Shared disk volume to a brief online, when, after the completion of all nodes to restart the Shared volume in the offline state, through the observation, found that the Shared disk volume constantly switching back and forth on the node 1, 2, 3, 4 online, but this a few nodes are tips online failure, the manual will be Shared volume switch to the node 5, 6, prompt the connection is successful, but less than a few minutes, cluster and automatically Shared volume journeyed on node 1, 2, 3, 4, and finally had to shutdown node 1, 2, 3, 4, Shared volume on the node 5 online success, the cluster to begin to work, but when I opened the node 1, 2, 3, 4, cluster and can't work normally, finally had to temporarily disable network node 1, 2, 3, 4, ensure that the customer can normal production during the day,
Night maintenance engineer will access the cluster nodes 1, 2, 3, 4 to network, the fault appear again during the day, but we can return one of the node domain after to add domain, system should prompt network error, according to the online information, view the server system, TCP/IP netbios, netlogin several services, found that are disabled, again will these services set to automatically, and manual startup, after added domain is normal, also can work normally after joining the cluster nodes actually... ,
Now I have some questions, don't know if you can help me to answer the (because of the need to provide to the customer failure analysis report)
1, server, TCP/IP netbios, are automatically start netlogin service theory, why would somehow be disabled? Client machines usually have virus protection, and recovery after we are on the machine antivirus scan, but found no virus,
2, why some of the node is disabled, and the other nodes to normal? Whether there are bugs related server 2016, Microsoft have released relevant patch, patch number? (server 2012 cluster there seems to be a related bugs, Microsoft issued relevant patch, but I didn't find relevant information of 2016

CodePudding user response:

Not swept out of the virus, it really is a system bug, not looking for support from Microsoft?

CodePudding user response:

Estimation is listening to the advice of Internet shut down unnecessary services like this, before blackmail viruses come out, not to disable the port 445 antivirus methods had been suggested?
It shut down the server service to disable the port 445
  • Related