Install ElasticSearch on Ubuntu VMs in Azure

Since ElasticSearch is hot sh*# these days, and my old hacker friend Thomas Ardal wrote a nifty guide on how to install it on Windows VMs in Azure, I thought I might as well supplement with a guide on how to do the same thing, only on Ubuntu VMs in Azure….

So, in this guide I’ll take you through the steps necessary to set up three Ubuntu VMs in Azure and install an ElasticSearch node on each of them, and finally connect the nodes into a search cluster… here goes:

First, create a new virtual network

Unless you intend to add your new Ubuntu VMs to an existing virtual network, you should use the “New” button and go and create a new virtual network. You can just fill in the name and leave all other options at their defaults.

guide01

Create virtual machines

Now, go and create a new virtual machine from the gallery.

guide02

Select the latest Ubuntu from the list.

guide03

Give your virtual machine a sensible name – in this case, since this is the third machine in my ElasticSearch cluster, I’m calling it “elastica3”. For all three machines, I’ve created a user account called “mhg” on the machine so I can SSH to it.

guide04

On the first machine, be sure to create a new cloud service that you can use to load balance requests among the machines. When adding the subsequent machines, remember to select the existing cloud service. In this case, since it’s balancing among “elasica1”, “elastica2”, and “elastica3”, I’m calling the cloud service “elastica”.

Moreover, it’s important that you add the machines to the same availability set! This way, Azure will ensure that the machines are unlikely to crash/be disconnected/fail at the same time by putting the machines in different fault domains.

guide05

When the first machine was added, the public port 22 on the cloud service “elastica” got automatically mapped to port 22 on the machine. When adding the subsequent machines, select another public port to map to 22 so that you can SSH to each individual machine from the outside. I chose 23 and 24 for the two other machines.

guide06

SSH to each machine

Open up a terminal and

in order to SSH to the first machine, logging in as “mhg”. In this example, I’m using the (default) port 22 which I will replace with 23 and 24 in order to SSH to the other two machines.

Update apt-get

On each machine, I start out by running a

in order to download the most recent apt-get package lists.

Install Java

Now, on each machine I install Java by going

and at this point I usually feel inspired to go grab myself a cup of coffee… 😉

Download and install ElasticSearch

And, finally, we’re ready to install ElasticSearch – go to the download page and copy the URL of the DEB package. At the time of writing this, the most recent DEB package is https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.deb which I download and install on each machine like this:

Configure ElasticSearch cluster

In order to be able to edit the configuration file, I

and go

By default, ElasticSearch will use UDP to dynamically discover an existing cluster which it will automatically join. On Azure though, we must explicitly specify which nodes go into our cluster. In order to do this, uncomment the line

to disable UDP discovery, and then add the full list of the IP addresses of your machines on the following line:

In my case, the IPs assigned to the VMs were 10.0.0.4 through 10.0.0.6. You can use ifconfig on each machine if you’re in doubt which IP was assigned (or you can check it out via the Azure Portal).

After saving each file, remember to

for ElasticSearch to pick up the changes.

Check it out

Now, on any of the three machines, try CURLing the following command:

which should yield something like this:

Finally, let’s make the cluster accessible from the outside….

Set up load balancing among the three VMs

Go to the first VM on the “Endpoints” tab and add a new endpoint.

guide07

guide08

Remember to check the option that you want to create a new load-balanced set. Just go with the defaults when asked about how the load balancer should probe the endpoints.

Last thing is to add an endpoint to the two other VMs, selecting the existing load-balanced set.

guide09

When this step is completed, you should be able to visit your cloud service URL (in my case it was http://elastica.cloudapp.net:9200) and see something like this:

So, is it usable yet?

Not sure, actually – I haven’t had time to investigate how to properly set up an authorization mechanism so as to make my cluster accessible only to specific applications.

If anyone knows how to do that on Azure, please don’t hesitate to enlighten me 🙂

7 thoughts on “Install ElasticSearch on Ubuntu VMs in Azure

  1. Great guide! Running ElasticSearch on Linux seems like the mature choice.

    Regarding security, neither ElasticSearch nor Azure supports this out of the box. I would really like Microsoft to implement “local endpoints” or something, making you cluster accessible from Azure websites and cloud services only. Until this happen (maybe never), you can use the Jetty plugin for ElasticSearch which support both HTTPS and basic authentication. I’m using that with a local generated key on elmah.io. Seems to work pretty good.

    1. Cool! I think I’ll have a look at the Jetty plugin – seems like a fairly simple way of enabling secure communication.

      @Kenneth: Seems like a nice and simple solution to providing HTTP basic auth, but it should probably not be used for any kind of sensitive data when it is exposed to the internet. Looks neat though 🙂

Leave a Reply to mookid Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.