How to Configure ElasticSearch for the Best Performance Possible

Search technology can present a huge burden when it comes to log management. It has to be able to perform indexing in almost real-time on a massive scale—usually in the range of hundreds of thousands of log events every second—and it simultaneously needs to be able to reliably handle high volumes of search inquiries on the very same index with both efficiency and reliable performance. The most obvious solution to this challenge is to optimize ElasticSearch in a way within which it can perform both sets of tasks in the most efficient manner possible, but this may not be as easy as it sounds.

Planning for Growth

The first tip to optimizing ElasticSearch performance is to plan for cluster (size), index, and shard growth. The single biggest factor in terms of management is cluster size. One mistake that people commonly make when using ElasticSearch is not paying attention to the overall size of their cluster state. For example, if you include too many shards or indices in your search, the load can become too much to deal with, which leads to an adverse effect in terms of ES cluster performance.

It’s very easy to do some basic math to avoid this, assuming you have reliable data within your system and you are aware of the overall dataset size. With this information, you can quickly come to understand if you are scaled correctly to handle your overall test case. When it comes to configuring your ES, you have total control over the amount of indices and shards you have in your system, which gives you the power to stay well away from the danger zone and within the optimal performance zone.

The 1 (Tb) TeraByte Use-case Example

We can take a dataset size of 1(TB) example and see how proper cluster planning becomes very important. If your goal is to search a dataset of this size with optimal performance, then determining the correct sharding (splitting of the data) is critical. The math for this is simple:
1Tb at (Shard Factor of 15)= (1 terabyte) / 15 = 66.6666667 gigabytes. At 66GB per shard, with a large cluster of at least 15 ElasticSearch Nodes/Servers, you can achieve very good initial performance. Of course, there are many other factors to consider, but this use-case can serve as a good starting point for a dataset of this size.

Know Your Topology

In order to make the right decisions regarding configuration within ES, you need to be well versed in your topology. When you establish your ES nodes, you establish them as master and data by using two different properties that are either set as true or false. That is actually the easy part when it comes to ES properties, but for log management, there are some much more complex and difficult issues you’ll have to address.

Three Tier Elasticsearch Pattern

The image below shows the 3 Tier ElasticSearch Cluster Pattern. Dividing up the responsibilities of the cluster is a common practice with Enterprise implementation of Elastic. Defining nodes as Client, Master, and Data gives you clear boundaries regarding where you can scale horizontally. However, because many enterprise clients see an economic advantage in investing into larger servers, there is also a strategy to scale Vertically with ElasticSearch, and 3 Tier designs have many benefits when scaling is important to an overall strategy.

[cmsmasters_gallery shortcode_id=”7zcrxrocxr” layout=”gallery” gallery_type=”grid” gallery_padding=”10″ image_size_gallery=”full” gallery_columns=”1″ gallery_links=”lightbox” animation_delay=”0″]14422|×150.png[/cmsmasters_gallery]

Avoid Swapping

The swapping of memory in ES nodes is a process that significantly slows down the entire operation and, as such, should be avoided. This is because, when copied to the swap space from chunks of pages from the RAM, the speed of discs plummets and can then become tens of thousands of times slower to access. The mlockall property within ES denies your ES nodes the ability to not swap memory. In order to change a property from false to true and disallow swapping, you need to enter the following command: “bootstrap.mlockall: true”. In the 5.x version of ES, this command will instead be: “bootstrap.memory_lock: true”. In order to determine if your property is set to true or false, you can also enter the following command: “curl http://localhost:9200/_nodes/process?pretty”.

Manage the Discovery Protocol

The default tool used to communicate and distinguish between nodes in a cluster is known as “zen discovery.” There are a variety of different commands you can input into this mechanism to manage it, but the main ones are: “discovery.zen.minimum_master_nodes” (in order to control the minimum amount of nodes visible to the master node in order to function within a cluster) and “” (in order to specify which group of hosts will be used for communicating with zen discovery).

The process for detection of nodes is controlled by this property: “discover.zen.fd.ping_timeout”. Default value determines how long a node will have to wait for a response, and this is automatically set to 30s. If you are operating on a congested network, you will need to increase this value. Remember, the higher the value, the lower the chance that there will be a discovery of failure.

Use Doc Values

Doc values are the on-disk data structure which is established at document index time. The analytical values of ES can actually make it perform far better than you would have anticipated. With Doc Values, ES has essentially become a columnar store. Compared with other normal fields within ES, Doc Values will minimize your heap usage, and your OS file system cache will need to be taken full advantage of in order to minimize the amount of disk reads.

Know How to Navigate Allocation-Related Properties

It’s important to remember that when shards are allocated to nodes, this process is known as “shard-allocation.” This process often happens during the initial re-balancing replica allocation or recovery of nodes. It can also occur when nodes are being removed or added.

In Conclusion

ElasticSearch can be complicated to use when trying to perform log management tasks. There are, however, various little tools and tricks about which most people are not aware that can save a great deal of time and money. Use the tips in this article, and you will be well on your way to optimizing your ES process.