Since ElasticSearch is hot sh*# these days, and my old hacker friend Thomas Ardal wrote a nifty guide on how to install it on Windows VMs in Azure, I thought I might as well supplement with a guide on how to do the same thing, only on Ubuntu VMs in Azure….
So, in this guide I’ll take you through the steps necessary to set up three Ubuntu VMs in Azure and install an ElasticSearch node on each of them, and finally connect the nodes into a search cluster… here goes:
First, create a new virtual network
Unless you intend to add your new Ubuntu VMs to an existing virtual network, you should use the “New” button and go and create a new virtual network. You can just fill in the name and leave all other options at their defaults.
Create virtual machines
Now, go and create a new virtual machine from the gallery.
Select the latest Ubuntu from the list.
Give your virtual machine a sensible name – in this case, since this is the third machine in my ElasticSearch cluster, I’m calling it “elastica3″. For all three machines, I’ve created a user account called “mhg” on the machine so I can SSH to it.
On the first machine, be sure to create a new cloud service that you can use to load balance requests among the machines. When adding the subsequent machines, remember to select the existing cloud service. In this case, since it’s balancing among “elasica1″, “elastica2″, and “elastica3″, I’m calling the cloud service “elastica”.
Moreover, it’s important that you add the machines to the same availability set! This way, Azure will ensure that the machines are unlikely to crash/be disconnected/fail at the same time by putting the machines in different fault domains.
When the first machine was added, the public port 22 on the cloud service “elastica” got automatically mapped to port 22 on the machine. When adding the subsequent machines, select another public port to map to 22 so that you can SSH to each individual machine from the outside. I chose 23 and 24 for the two other machines.
SSH to each machine
Open up a terminal and
ssh firstname.lastname@example.org -p22
in order to SSH to the first machine, logging in as “mhg”. In this example, I’m using the (default) port 22 which I will replace with 23 and 24 in order to SSH to the other two machines.
On each machine, I start out by running a
in order to download the most recent apt-get package lists.
Now, on each machine I install Java by going
sudo apt-get install openjdk-7-jre-headless -y
and at this point I usually feel inspired to go grab myself a cup of coffee…
Download and install ElasticSearch
And, finally, we’re ready to install ElasticSearch – go to the download page and copy the URL of the DEB package. At the time of writing this, the most recent DEB package is https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.deb which I download and install on each machine like this:
sudo dpkg -i elasticsearch-0.90.5.deb
sudo service elasticsearch start
Configure ElasticSearch cluster
In order to be able to edit the configuration file, I
sudo apt-get install emacs
sudo emacs /etc/elasticsearch/elasticsearch.yml
By default, ElasticSearch will use UDP to dynamically discover an existing cluster which it will automatically join. On Azure though, we must explicitly specify which nodes go into our cluster. In order to do this, uncomment the line
to disable UDP discovery, and then add the full list of the IP addresses of your machines on the following line:
discovery.zen.ping.unicast.hosts: ["10.0.0.4", "10.0.0.5", "10.0.0.6"]
In my case, the IPs assigned to the VMs were 10.0.0.4 through 10.0.0.6. You can use
ifconfig on each machine if you’re in doubt which IP was assigned (or you can check it out via the Azure Portal).
After saving each file, remember to
sudo service elasticsearch restart
for ElasticSearch to pick up the changes.
Check it out
Now, on any of the three machines, try CURLing the following command:
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
which should yield something like this:
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
Finally, let’s make the cluster accessible from the outside….
Set up load balancing among the three VMs
Go to the first VM on the “Endpoints” tab and add a new endpoint.
Remember to check the option that you want to create a new load-balanced set. Just go with the defaults when asked about how the load balancer should probe the endpoints.
Last thing is to add an endpoint to the two other VMs, selecting the existing load-balanced set.
When this step is completed, you should be able to visit your cloud service URL (in my case it was http://elastica.cloudapp.net:9200) and see something like this:
name: "Machine Teen",
tagline: "You Know, for Search"
So, is it usable yet?
Not sure, actually – I haven’t had time to investigate how to properly set up an authorization mechanism so as to make my cluster accessible only to specific applications.
If anyone knows how to do that on Azure, please don’t hesitate to enlighten me