Ask To Apps
  • Home
  • WordPress
  • Elasticsearch
  • PHP
  • Linux
  • Website Peformance

Tag : elasticsearch-performance

Scaling elasticsearch cluster. Part-I

28/05/2014 Article

Following are some points could help to improve elasticsearch performance and it can scale better. Here is Ideal cluster infrastructure based on my research.

ScallingES

  1. Single point url for searching and indexing.
  2. Load balancer tool.
  3. A master node.
  4. Couple of non-data and non-master node.
  5. Set of data nodes.
  6. Memory management.
  7. Elasticsearch config.
  8. Monitoring tools.

Single point of url for search and index:- It would be good if we can keep single url for searching and indexing. It can be done by having load balancer tool. Behind load balancer configure backend nodes as master and non-data nodes. Reason behind not keeping data nodes as a backend for load balancer is, to avoid un-wanted http requests on data nodes. It will keep data nodes away from serving http request which are coming for searching and indexing data.

So, data node can easily able to search from shards or creates index based wrt. request.

Master Node:-  For stability and best performance of elasticsearch cluster and based on elasticsearch recommendation, we should keep a spate node as master node. It can be done by making “data=false” and “master=true” in config file.

All other nodes should look for this master node by setting up following config properties…

  • discovery.zen.ping.multicast.enabled: false
  • discovery.zen.minimum_master_nodes: 1
  • discovery.zen.ping.unicast.hosts: [“master node”]

Keep couple of non-master and non-data nodes for serving http requests. That will also help if master node goes down.

Data Nodes: – Data nodes are specially meant for searching request from shards and sending result back and creating new data index on cluster. So with respect to master node and non-data nodes, these would require more RAM and processing power.

As data node holds data, so we should keep disk size as per our data volume requirement.

It’s very important question is how much data node should I keep?

Answer: – Currently I can tell if you are having less than 500GB of data volume and you are having 5 numbers of shards per index with good amount of search and indexing requests. You must need to have 5 data nodes for balancing performance. If you keep 3 or 4 data nodes then 2 or 1 data nodes would allocate 2 shards respectively. Shard distribution will not be proportionate.  It leads to ….

  • Load issue.
  • Disk space issue.
  • Un-stability in cluster.
  • Search performance.

If data size is less than 150GB, it will not matter much.Note (I have tested these on centos 8 core machine with 16 GB of RAM; I will come up with actual numbers and stats in Part-II)

Memory management :- Currently keep it simple like 50% for JVM heap and 50% for ES

Elasticsearch config properties :-

  • Cluster Name
  • Node name
  • Node.master – Enable for master node.
  • Node.data – Enable for data node.
  • transport.tcp.compress: true
  • discovery.zen.ping.timeout: 10s
  • discovery.zen.minimum_master_nodes: 1
  • discovery.zen.ping.multicast.enabled: false
  • discovery.zen.ping.unicast.hosts: [“Master node”]
  • action.disable_shutdown: true
  • disable_delete_all_indices: true

Monitoring tools :-

Following are good monitoring tool I liked and they are very helpful.

1)      Elastic-hq – http://www.elastichq.org/

2)      Elastic head – https://github.com/mobz/elasticsearch-head

3)      https://github.com/karmi/elasticsearch-paramedic

4)      bigdeskwatch full The Lost City of Z 2017 film onlinedownload movie Pirates of the Caribbean: Dead Men Tell No Tales now

 

Coming soon … Part – II 

  • How much cluster can handle search request?
  • How much ideal index size with respect to performance?
  • Memory and disk size forecasting for elasticsearch node.
  • Shards? Keep More or less?
  • Segmentation? and Routing?

 

 

Categories: Elasticsearch, Website Peformance

Tags: Elasticsearch, elasticsearch performance, scaling elasticsearch

About Author:

Appa

Why elasticsearch makes two master nodes in cluster – Split brain problem

23/02/2014 Article

I have faced major issue in elasticsearch, in my cluster after some time elasticsearch automatically enables more than one node as master nodes. Due to that, it was showing two set of nodes in one cluster. It affects following …Watch movie online The Transporter Refueled (2015)

  • Cluster health becomes yellow.
  • Nodes causing CUP load issue.
  • High usage of system memory.
  • Affected performance of search and indexing.

You might face same issue and its normal, may be it is due to …

  • Problem in inter-communication between nodes.
  • Network issue in cluster.
  • Big index size and high search/index bandwidth.

Its very easy to fix this issue. Master node maintain a cluster, and requests indexing or search to data nodes and Data node stores data. When it receives a request from a client, it searches data from shards or creates an index. If we have asked a node to do both job its become difficult to manage it. So master node has to maintain cluster as well as do search and index data. It cause performance issue. Best solution is keep them separate. In your cluster you should keep one master node only and configure all nodes to look same master for cluster state. For failover you might keep one extra master node as disaster recovery.

Following are the setting to do that.

Master node

node.master: true

node.data: false

transport.tcp.compress: true

discovery.zen.minimum_master_nodes: 1

discovery.zen.ping.timeout: 15s

discovery.zen.ping.multicast.enabled: false

discovery.zen.ping.unicast.hosts: ["master node"]

Data Node

node.master: false
node.data: true

transport.tcp.compress: true

discovery.zen.minimum_master_nodes: 1

discovery.zen.ping.timeout: 15s

discovery.zen.ping.multicast.enabled: false

discovery.zen.ping.unicast.hosts: ["master node"]

Change all nodes config likewise and restart all nodes.

 

 

Categories: Elasticsearch, Website Peformance

Tags: Elasticsearch, elasticsearch performance, split brain problem

About Author:

Recent Posts

  • Install and configure logstash-forwarder
  • Redirect request on php script through squid proxy
  • Alerting for Elasticsearch : Log watcher in elasticsearch using ES Watcher
  • Detect face from image using python script with OpenCV
  • Change mysql root password on centos
  • Search part of word in elasticsearch using nGram – auto-complete search
  • Connect VPN on centos linux using command line
  • Custom river plugin in elasticsearch
  • Backup elasticsearch with snapshot and restore api
  • PHP code to exact keywords from text.

Tags

apache Apache Lucene cache Distributed Elastic Index Elasticsearch elasticsearch performance Git Clone Git Hub Git Hub Configuration Git Hub Installation grep Import Install MySQL JSON over HTTP Linux Linux Command Linux Commands Linux search local file lsyncd md5sum Multile Domain Multisite MySQL Open Source optimization performance php performance real time search remote file Remote Git Hub remote sync RESTful Scale Schema Free Search Engine Search Index Search keyword static cache Sub Domain sync Very fast Wordpress Wordpress multisite

Copyright Ask To Apps 2022 | Proudly powered by WordPress

facebook twitter google linkedin Email Rss