Learn how to use Kubernetes & Elasticsearch

[fa icon="calendar"] March 27/2018 / by Andrés Rodríguez y Cristyan Sepulveda

This is a guide to facilitate the process of installing a Kubernetes server using Elasticsearch technology.

What are Elasticsearch and kubernetes? We will be explaining it through examples.

Definition

Kubernetes: is an 'open source' system created by Google to manage applications in containers, allowing actions such as scheduling deployment, scaling and monitoring of containers. Google has been using kubernetes mostly for their own products such as Gmail, the search engine, Drive, and Maps. Its objective is to use the services within the cloud, and its differentiating factor, amongst all other cloud computing solutions, is that it was specifically built to work with Docker.

Docker meanwhile, is a very powerful system of containerization, in other words, its a tool that creates lightweight and portable containers for software applications that can run on any machine with docker installed, regardless of where it is executed. Therefore, Kubernetes allows software applications to package their own containers easily and quickly transfers and executes them to any computer.

Elasticsearch: is a search engine and distributed analysis designed to have horizontal scalability, reliability and easy management. It combines the speed of search with the power of analysis through a query language, friendly developers.

It allows the ability to index and analyze in near real time, large amounts of data in a distributed manner. This data can be stored in the form of documents, structured or not.

Having already given some brief concepts of these technologies, we will proceed to install these tools. It’s worth knowing that all the installation and testing was done on a Linux operating system, and run on a Fedora distribution. However, although some commands change depending on the operating system installation, the process should be the same, and generate the same results.

 

Installation

Docker For starters, we will need to access the official website of Docker https://docs.docker.com/install/ once you choose the operating system where you will be doing, the following steps must be followed. As mentioned above, the process will be done through a Fedora Linux distribution.

The first step is to download the repository Docker:

Then, you must install the downloaded repository:

Once installed, we will then start Docker:

and verify that it works by uploading the image that comes by default:

Elasticsearch:  We must first access the URL https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html for respective discharge. While there are various forms of installation, the aim is to install Elasticsearch using Docker.

We will run the following command to download the latest image of Elasticsearch:

Now the image will be installed and downloaded in development mode through instruction:

Finally, to check Elasticsearch is working correctly, type in the web browser: localhost:9200,and then you should be able to see something like this:

Kubernetes: The first step is to decide whether to work with the Cloud Shell propositioned by Google, or whether to work with a Local Shell. Either way, we will explain both.

For the first option, with regards to Cloud Shell, we must first access the URL https://console.cloud.google.com/home/dashboard?project=solid-ruler-197414&_ga=2.240435719.-568970604.1520383743,and login with our Google account .

Then we click on the button "Activate Google Cloud Shell" located at the top right.

kubernetes_1

And we will have the shell ready to start working:

null

Now, if we want to use the local Shell, first you must access the URL https://cloud.google.com/sdk/docs/quickstarts,and once there, select installation depending on your operating system. Either way, we will continue the example using the Fedora Linux distribution.

We downloaded the package depending on the architecture of the operating system,

depository_.jpg

Once downloaded  you must decompress and run the installation using the command:

The next step is to initialize using this command:

Then, we must accept the option to print screen command:

null

We will then open the web browser, and we will be asked to sign in with our Google account.

And finally, we install the kubectl component using this command:

From here on out, the procedure will continue to be the same regardless of the shell being used.

Example

After the installation of the tools, we start by creating a simple and complete project. 

The first step is to create a new project, and for that we must first go to the URL https://console.cloud.google.com/kubernetes/list?project=solid-ruler-197414&authuser=1,which takes us to the development page of Google Cloud

null

Once there, we head to the "Select a project"  located at the top, and select the + icon to create a new project. We assign the name of our project, in this case it will  be"workep.com".

null

The next step is to enable the Kubernetes API. To do this, we first go to the left panel of the Google Cloud options, and in the option "APIs & Services" select the"Library" option.

We can now search for Google Kubernetes Engine API in the search bar, and enable it.

null

Now that we can work with the Kubernetes API, we must now establish a region and area.

Recall that Kubernetes is a cloud service, therefore, location is a factor that determines the availability of certain resources. Each region has one or more zones. For example, US-central1 refers to the central region of the United States, which has the following areas: US-central1-A, US-central1-b, c-us-central1 and us-central1-f.

It’s important to know how to choose a region and an area so that failures are properly handled because Google designed areas that are independent of each other: an area generally has power, cooling, network and control planes that are isolated from other areas, and most fault events only affect a single area. Therefore, if an area becomes unavailable, you need to transfer traffic to another area in the same region to keep their services running. Similarly, if a region is experiencing some disturbance, support services must be running across a different region.

It is also important to reduce network latency. You may want to choose a region that is near to your point of service area. For example, if most of my clients are from the east coast of the US, I would want to choose a region and main area, as well as a backup area, that is close to the east coast.

The diagram provides some examples of how regions and areas relate to each other. Note that each region is independent of other regions, and each zone is isolated from other areas within the same region.

null

We must create a cluster and deploy our project in Kubernetes.

A cluster consists of a master machine and several working machines called nodes. The nodes are instances of virtual machines (VM) that execute the Kubernetes processes needed to make them part of the cluster. That is, applications in the cluster are implemented, and those applications will then execute on the nodes.

Although there are various ways of doing this process, the idea of this guide is to do so using elasticsearch. Therefore, we must first create a text file with the name you wish, respecting the extension which must be .yaml. YAML Ain't Markup Language, it’s a serialization format of human readable data.

Once we have created the file, we put in the following configuration:

When you create an object in Kubernetes, you must specify the desired state of that object. That is why you must understand the Importance of having knowledge about each of the objects that are created in the file.

 apiVersion: refers to the applications API group to perform a type of implementation. In our cas, it is advisable to use apps / v1beta1.

Kind: refers to the type of object that is being created. Deployment in this case is because our cluster will replicate and remove existing nodes cyclically.

metadata: data that uniquely identifies the object.

Spec: refers to the specifications of the object. The precise format of the specified object is different for each Kubernetes object and contains specific nested fields for that object.  

Replicas: they are the basis of self healing applications on a Kubernetes cluster. The system sets to 1 by default.

initContainers: a type of container that runs before regular containers and they differ from these because they always run successfully until the end. These containers have separate images from the containers within the application, which in turn allows to contain and run utilities that are not desirable to include within the image of the regular container.

vm.max_map_count: Elasticsearch uses a directory of memory called "mmapfs" to store their indexes. It is probable that the predetermined limits are too low, which could result in exceptions due to insufficient memory.

You can increase the limits using the command sysctl -w vm.max_map_count = 262144

Env: refers to the environmental variables, which provide information about the container itself, and information on other objects within the cluster.

For Elasticsearch, a minimum and maximum size of 2GB by -Xms2g and -Xmx2g was established by setting these values through ES_JAVA_OPTS.

bootstrap.memory_lock to "true" was also set to avoid memory swapping and avoid performance effectivity.

Containers: the image and image information that will be used was also established. In our case it is the latest version of Elasticsearch. We also defined the access port 9200, and a virtual path through which the image / usr / share / elasticsearch / data will be mounted.

SecurityContext: refers to the security context that define the privileges and control settings for access to a container.

Volumes: refers to the volume setting that is to be created. In our case, a persistent disk is created with ext4 format, which is highly recommended for virtual machines.

Having clarified this information, we will implement our cluster in Kubernetes through the following instruction:

An outgoing message: deployment "workep-Kube" created will be shown stating that the cluster was successfully created.

In our workplace we should see:

null

Kubernetes offers integrated support for two types balanced cloud loading for an application accessible for the public: TCP and HTTP(S).

In our case, the idea is to expose our HTPP(S) project hosted on Kubernetes, so the recommendation would be to use HTTP(S) load balancer.  

The next step,  would be to expose our implementation as an internal service. That is, we created a resource service to make the elastic-cluster implementation accessible within our cluster.

When we create a Nodeport-type service, Kubernetes makes the service available in a high-port number randomly selected from all nodes in the cluster.

In our area of work, we can notice that the service has been created:

null

Now, to make our HTTP(S) server application publicly accessible, we must create an Ingress resource.

Ingress is a Kubernetes resource, which encapsulates a set of rules and configurates them to route external HTTP(S) traffic to internal services. When an Ingress is created in the cluster, Kubernetes creates an HTTP(S) load balancer and configures it to route traffic within the application.

The next step is to create a .yaml configuration file that defines an Ingress resource, which directs traffic to our service.

To implement the Ingress resource that we just created, we must execute the following instruction:

Once implemented, Kubernetes creates an Ingress resource in our cluster. The Ingress controller running in the cluster is responsible for creating an HTTP(S) load balancer to route all external HTTP traffic to the NodePort service that was exposed.

To verify that it works, we simply go to the web browser, and load a tab with the external IP address of the load balancer. We will see the following plain text HTTP:

Conclusions

Simply, Google: It’s not about showing favoritism, but we must recognize Google know what it’s doing. Although hard to believe, all of Google’s services work thanks to their own invention: Kubernetes.
The main concept of Kubernetes is to eliminate the blocking of infrastructure by providing the containers with basic skills with no restrictions, ie, to allow all applications to run on multiple operating environments, including dedicated servers, virtualized private clouds and public clouds. All this, at a very affordable cost that allows large, medium and small enterprises to develop, scale and manage applications, making a clear use of the resources provided by Google.

Optimizing what’s important, data: Today the large volumes of data that is being handled should be able to be manipulate through agile methods, and that is why is is important to use Elasticsearch to achieve this. Success stories of companies using Elasticsearch like Facebook and Netflix show that by indexing speed, you can reach the data needed, without having to go through all the locations where data is stored. Therefore, the data is obtained instantly, regardless of the increasing amount of data we get to have within our company. 

Request a Team Training with Workep

All users will have access to a Team Training session with one of our account managers. The session will last about 30-45 minutes. You will be fully prepped on everything you need to know about how to use Workep + you will get tips on what types of project management methodologies work best within the platform. We will finish off with a 15 minute, Q&A to clear up any doubts your team may have.

To request a team training, please click on the "Book a Meeting" button to find the perfect date & time for your team to receive their training. 

Book a meeting