elasticsearch – mookid on code

I’ve written before on how you can install ElasticSearch on a couple of Ubuntu VM’s in Azure, but while getting the database engine up and running is of course pretty important in order to start working, there are a couple of additional things that really make working with ElasticSearch super-smooth: Marvel and Kibana!

In this post I’ll show you how you can install both.

Marvel

Marvel is a database monitoring tool that can show you how your ElasticSearch is feeling with a huge amount of meters and gauges – I won’t go into that here, because I haven’t actually looked at that part yet, but the thing I’m excited about is Sense, which is described as a “developer console” where you can go and do RESTful things to your ElasticSearch database.

Let’s install Marvel!

Open up a terminal and go to the root directory of your ElasticSearch installation. Depending on your platform, you’ll have to execute the following command with administrator rights, so it’s either sudo or

Skærmbillede 2014-06-13 kl. 21.11.00

for you – and then you go

> bin/plugin --install elasticsearch/marvel/latest

1	> bin/plugin --install elasticsearch/marvel/latest

which should result in downloading and installing the Marvel plugin. If it succeeds, you should be able to go to the /_plugin/marvel path of your ElasticSearch node to see all the graphs and meters I talked about. Try going to Sense by clicking on the Dashboard button in the top right corner – now you can use a lightweight cURL-like syntax with autocomplete to e.g. put a document to the database like this:

sense1

or do a quick search like this:

sense2

Neat! The Marvel plugin costs money if you want to use it for productino purposes, but as far as I can tell, it’s absolutely free to use during development, which is also the time when Sense makes the most sense, effectively meaning that you can consider Sense free.

Now, let’s check out Kibana…

Kibana

Kibana is a simple generic ElasticSearch data analysis and dashboard tool that can help you visualize your data in pretty ways, and since it’s Apace V2-licensed, it’s absolutely free to use for all intents and purposes.

Since Kibana is just a modern SPA, you can go to Kibana’s GitHub and get the code which you can put in a directory or host on a web server somewhere, you’ll just need to edit the line in config.js where the ElasticSearch URL is configured.

Another option is to let ElasticSearch do the hosting, which can be easily achieved by installing Kibana as a plugin like this:

> bin/plugin --install elasticsearch/kibana3

1	> bin/plugin --install elasticsearch/kibana3

Small caveat: Installing the plugin will copy the entire source code as it looks in the Git repository, which means that the Kibana URL will become the base URL of the plugin /_plugin/kibana followed by the path /src/index.html – the full URL then becomes e.g. something like http://localhost:9200/_plugin/kibana/src/index.html.

Even though Kibana is fairly generic and can be used to visualize your ElasticSearch data in many ways, it seems to give special treatment to time series-based data, where especially logs come to mind – which is probably why it has extra-special treatment for logs imported into ElasticSearch via logstash.

That concludes this small guide on how to install Marvel and Kibana. Stay tuned for more ElasticSearch 🙂

Since ElasticSearch is hot sh*# these days, and my old hacker friend Thomas Ardal wrote a nifty guide on how to install it on Windows VMs in Azure, I thought I might as well supplement with a guide on how to do the same thing, only on Ubuntu VMs in Azure….

So, in this guide I’ll take you through the steps necessary to set up three Ubuntu VMs in Azure and install an ElasticSearch node on each of them, and finally connect the nodes into a search cluster… here goes:

First, create a new virtual network

Unless you intend to add your new Ubuntu VMs to an existing virtual network, you should use the “New” button and go and create a new virtual network. You can just fill in the name and leave all other options at their defaults.

guide01

Create virtual machines

Now, go and create a new virtual machine from the gallery.

guide02

Select the latest Ubuntu from the list.

guide03

Give your virtual machine a sensible name – in this case, since this is the third machine in my ElasticSearch cluster, I’m calling it “elastica3”. For all three machines, I’ve created a user account called “mhg” on the machine so I can SSH to it.

guide04

On the first machine, be sure to create a new cloud service that you can use to load balance requests among the machines. When adding the subsequent machines, remember to select the existing cloud service. In this case, since it’s balancing among “elasica1”, “elastica2”, and “elastica3”, I’m calling the cloud service “elastica”.

Moreover, it’s important that you add the machines to the same availability set! This way, Azure will ensure that the machines are unlikely to crash/be disconnected/fail at the same time by putting the machines in different fault domains.

guide05

When the first machine was added, the public port 22 on the cloud service “elastica” got automatically mapped to port 22 on the machine. When adding the subsequent machines, select another public port to map to 22 so that you can SSH to each individual machine from the outside. I chose 23 and 24 for the two other machines.

guide06

SSH to each machine

Open up a terminal and

ssh mhg@elastica.cloudapp.net -p22

1	ssh mhg@elastica.cloudapp.net -p22

in order to SSH to the first machine, logging in as “mhg”. In this example, I’m using the (default) port 22 which I will replace with 23 and 24 in order to SSH to the other two machines.

Update apt-get

On each machine, I start out by running a

sudo apt-get update

1	sudo apt-get update

in order to download the most recent apt-get package lists.

Install Java

Now, on each machine I install Java by going

sudo apt-get install openjdk-7-jre-headless -y

1	sudo apt-get install openjdk-7-jre-headless -y

and at this point I usually feel inspired to go grab myself a cup of coffee… 😉

Download and install ElasticSearch

And, finally, we’re ready to install ElasticSearch – go to the download page and copy the URL of the DEB package. At the time of writing this, the most recent DEB package is https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.deb which I download and install on each machine like this:

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.deb
sudo dpkg -i elasticsearch-0.90.5.deb
sudo service elasticsearch start

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.deb

sudo dpkg -i elasticsearch-0.90.5.deb

sudo service elasticsearch start

Configure ElasticSearch cluster

In order to be able to edit the configuration file, I

sudo apt-get install emacs

1	sudo apt-get install emacs

and go

sudo emacs /etc/elasticsearch/elasticsearch.yml

1	sudo emacs /etc/elasticsearch/elasticsearch.yml

By default, ElasticSearch will use UDP to dynamically discover an existing cluster which it will automatically join. On Azure though, we must explicitly specify which nodes go into our cluster. In order to do this, uncomment the line

discovery.zen.ping.multicast.enabled: false

1	discovery.zen.ping.multicast.enabled: false

to disable UDP discovery, and then add the full list of the IP addresses of your machines on the following line:

discovery.zen.ping.unicast.hosts: ["10.0.0.4", "10.0.0.5", "10.0.0.6"]

1	discovery.zen.ping.unicast.hosts: ["10.0.0.4", "10.0.0.5", "10.0.0.6"]

In my case, the IPs assigned to the VMs were 10.0.0.4 through 10.0.0.6. You can use ifconfig on each machine if you’re in doubt which IP was assigned (or you can check it out via the Azure Portal).

After saving each file, remember to

sudo service elasticsearch restart

1	sudo service elasticsearch restart

for ElasticSearch to pick up the changes.

Check it out

Now, on any of the three machines, try CURLing the following command:

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

1	curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

which should yield something like this:

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

{

"cluster_name" : "elasticsearch",

"status" : "green",

"timed_out" : false,

"number_of_nodes" : 3,

"number_of_data_nodes" : 3,

"active_primary_shards" : 0,

"active_shards" : 0,

"relocating_shards" : 0,

"initializing_shards" : 0,

"unassigned_shards" : 0

}

Finally, let’s make the cluster accessible from the outside….

Set up load balancing among the three VMs

Go to the first VM on the “Endpoints” tab and add a new endpoint.

guide07

guide08

Remember to check the option that you want to create a new load-balanced set. Just go with the defaults when asked about how the load balancer should probe the endpoints.

Last thing is to add an endpoint to the two other VMs, selecting the existing load-balanced set.

guide09

When this step is completed, you should be able to visit your cloud service URL (in my case it was http://elastica.cloudapp.net:9200) and see something like this:

{
    ok: true,
    status: 200,
    name: "Machine Teen",
    version: {
        number: "0.90.5",
        build_hash: "c8714e8e0620b62638f660f6144831792b9dedee",
        build_timestamp: "2013-09-17T12:50:20Z",
        build_snapshot: false,
        lucene_version: "4.4"
    },
    tagline: "You Know, for Search"
}

{

ok: true,

status: 200,

name: "Machine Teen",

version: {

number: "0.90.5",

build_hash: "c8714e8e0620b62638f660f6144831792b9dedee",

build_timestamp: "2013-09-17T12:50:20Z",

build_snapshot: false,

lucene_version: "4.4"

tagline: "You Know, for Search"

}

So, is it usable yet?

Not sure, actually – I haven’t had time to investigate how to properly set up an authorization mechanism so as to make my cluster accessible only to specific applications.

If anyone knows how to do that on Azure, please don’t hesitate to enlighten me 🙂

Category: elasticsearch

Getting a good start with ElasticSearch