Scaling elasticsearch cluster. Part-I

Following are some points could help to improve elasticsearch performance and it can scale better. Here is Ideal cluster infrastructure based on my research.

ScallingES

  1. Single point url for searching and indexing.
  2. Load balancer tool.
  3. A master node.
  4. Couple of non-data and non-master node.
  5. Set of data nodes.
  6. Memory management.
  7. Elasticsearch config.
  8. Monitoring tools.

Single point of url for search and index:- It would be good if we can keep single url for searching and indexing. It can be done by having load balancer tool. Behind load balancer configure backend nodes as master and non-data nodes. Reason behind not keeping data nodes as a backend for load balancer is, to avoid un-wanted http requests on data nodes. It will keep data nodes away from serving http request which are coming for searching and indexing data.

So, data node can easily able to search from shards or creates index based wrt. request.

Master Node:-  For stability and best performance of elasticsearch cluster and based on elasticsearch recommendation, we should keep a spate node as master node. It can be done by making “data=false” and “master=true” in config file.

All other nodes should look for this master node by setting up following config properties…

  • discovery.zen.ping.multicast.enabled: false
  • discovery.zen.minimum_master_nodes: 1
  • discovery.zen.ping.unicast.hosts: [“master node”]

Keep couple of non-master and non-data nodes for serving http requests. That will also help if master node goes down.

Data Nodes: – Data nodes are specially meant for searching request from shards and sending result back and creating new data index on cluster. So with respect to master node and non-data nodes, these would require more RAM and processing power.

As data node holds data, so we should keep disk size as per our data volume requirement.

It’s very important question is how much data node should I keep?

Answer: – Currently I can tell if you are having less than 500GB of data volume and you are having 5 numbers of shards per index with good amount of search and indexing requests. You must need to have 5 data nodes for balancing performance. If you keep 3 or 4 data nodes then 2 or 1 data nodes would allocate 2 shards respectively. Shard distribution will not be proportionate.  It leads to ….

  • Load issue.
  • Disk space issue.
  • Un-stability in cluster.
  • Search performance.

If data size is less than 150GB, it will not matter much.Note (I have tested these on centos 8 core machine with 16 GB of RAM; I will come up with actual numbers and stats in Part-II)

Memory management :- Currently keep it simple like 50% for JVM heap and 50% for ES

Elasticsearch config properties :-

  • Cluster Name
  • Node name
  • Node.master – Enable for master node.
  • Node.data – Enable for data node.
  • transport.tcp.compress: true
  • discovery.zen.ping.timeout: 10s
  • discovery.zen.minimum_master_nodes: 1
  • discovery.zen.ping.multicast.enabled: false
  • discovery.zen.ping.unicast.hosts: [“Master node”]
  • action.disable_shutdown: true
  • disable_delete_all_indices: true

Monitoring tools :-

Following are good monitoring tool I liked and they are very helpful.

1)      Elastic-hq – http://www.elastichq.org/

2)      Elastic head – https://github.com/mobz/elasticsearch-head

3)      https://github.com/karmi/elasticsearch-paramedic

4)      bigdeskwatch full The Lost City of Z 2017 film onlinedownload movie Pirates of the Caribbean: Dead Men Tell No Tales now

 

Coming soon … Part – II 

  • How much cluster can handle search request?
  • How much ideal index size with respect to performance?
  • Memory and disk size forecasting for elasticsearch node.
  • Shards? Keep More or less?
  • Segmentation? and Routing?

 

 

Categories: Elasticsearch, Website Peformance