Elasticsearch cluster on separate VM's configuration how does it work?-CodePudding

ELK stack appears overwhelmingly confusing to me.

I've set up 5 node cluster on VM's with 2 [ data, data-content] nodes and 3 dedicated [ master ] nodes with plans on expanding data nodes to 4 with 2 warm nodes and 2 cold ones. They all agreed on single UID but no matter how much I try to understand how data stores in elastic, I simply can't get it.

First question: Data, when sent to cluster, has to be sent to data nodes, or master nodes and how they communicate with each other and agree where data will be stored?

Second question: In Logstash.yml file there's output line { |hosts => ["node_ip:9200","etc"]| } where you write node ip, I couldn't find what kind of node you have to send logs to. Is it [master] or [data]? In either case, do I have to write down every IP or elastic has some kind of mechanism to identify itself as cluster?

[Answering possible question: "Why don't set up kube cluster?". VM usage wasn't my decision.]

CodePudding user response：

Data, when sent to cluster, has to be sent to data nodes, or master nodes and how they communicate with each other and agree where data will be stored?

The data is written in nodes that have any data role (data, data_content, data_warm, data_hot etc), but the request can be made to any node in the cluster, elastic internally knows where to write the data. The nodes communicate with each other using the tpc port 9300 (by default).

Normally you should send requests to your data nodes, some architectures also use nodes that only have the ingest role and send the requests to these nodes.

Elasticsearch will also balance the number of shards of your indices between the data nodes so all the nodes have the same amout of data (this is not always possible depending on the number of indices and shards that you are using).

In Logstash.yml file there's |hosts => ["node_ip:9200","etc"]| line where you write node ip, I couldn't find what kind of node you have to send logs to. Is it [master] or [data]? In either case, do I have to write down every IP or elastic has some kind of mechanism to identify itself as cluster?

Are you talking about the hosts option in the elasticsearch output of a Logstash pipeline, right?

If so, this is basically the same thing as above, every node in the cluster can accept requests and the cluster will direct those requests to the data nodes to write the data, so you can use any node in the hosts option of your logstash pipelines.

It is pretty common to not send requests to the master nodes and only send it to nodes that have a data or ingest role.

The hosts option in the elasticsearch output in logstash supports more than one host, this will make logstash balance the requests between those hosts.