How to save Hive posts into Elasticsearch

Elasticsearch is a great search engine providing a distributed, multitenant-capable full-text search. It provides features like facets, keyword highlighting etc in the search user interface. Here is a tutorial to index Hive posts into elasticsearch.

image.png

Install Elasticsearch

Download Elasticsearch(ES) from the official site:

https://www.elastic.co/cn/downloads/elasticsearch

Decompress:

tar xvf elasticsearch-7.7.0-linux-x86_64.tar.gz

Then run the following command to start ES:

cd elasticsearch-7.7.0/bin 
./elasticsearch

Use curl to test:

curl localhost:9200

You will the the output like:

{
  "name" : "YOUR_SERVER_NAME",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "liHA066AQJeE91lv8lLqig",
  "version" : {
    "number" : "7.7.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
    "build_date" : "2020-05-12T02:01:37.602180Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Install Kibana

You don’t have to install Kibana prior to using ES. However, Kibana provides lots of additional features to your data.

Download from: https://www.elastic.co/cn/downloads/kibana

Decompress and modify file: config/kibana.yml, add this line:

server.host: "0.0.0.0"

Run Kibana

cd bin
./kibana

You will see the following UI

image.png

Import the eCommerence data for testing:

image.png

Open the devtool and run a simple query, you will see:

image.png

Important concepts in ES

Let’s compare the terms in ES with relational database:

ES RDB

Index => Database
Type => Table
Document => Row
Field => Column
Mapping => Schema

Install and test Python elasticsearch module

Install elasticsearch via pip:

pip install elasticsearch

Enter Python interactive mode, and run the following commands to create ES index called “hive-posts-index”:

>>> from datetime import datetime
>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch()
>>> es.indices.create(index='hive-posts-index', ignore=400)
{'acknowledged': True, 'shards_acknowledged': True, 'index': 'hive-posts-index'}

As shown in Kibana devtool, the ES index has been created:

image.png

Add a test document into your index:

es.index(index="hive-posts-index", id=1, body={"any": "data", "timestamp": datetime.now()})

Read the document you just added from ES:

>>> es.get(index="hive-posts-index", id=1)
{'_index': 'hive-posts-index', '_type': '_doc', '_id': '1', '_version': 1, '_seq_no': 0, '_primary_term': 1, 'found': True, '_source': {'any': 'data', 'timestamp': '2020-06-02T15:52:40.759337'}}

Add Hive posts into ES index

Here is a demo script to add my latest 5 posts into ES index:

from beem import Steem
from beem.account import Account
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()

hive    = Steem(nodes = 'https://api.hive.blog')
account = Account('aafeng', steem_instance = hive)
posts   = account.get_blog(start_entry_id=0, limit=5)

for post in posts:
    author   = post.author
    title    = post.title
    body     = post.body
    category = post.category
    permlink = post.permlink
    es.index(index="hive-posts-index", id=permlink, body={"author": author,\
                                                    "title":  title,\
                                                    "body":   body,\
                                                    "category": category,\
                                                    "permlink": permlink,\
                                                    "timestamp": datetime.now()})

We can query it in Kibana:

GET hive-posts-index/_search?q=*:*

The output suggests that my posts have been stored in ES:

image.png

We also can use curl to test if my posts have been saved into ES:

curl http://localhost:9200/hive-posts-index/_search\?size=5

Now we have flexi way to query post data:

GET hive-posts-index/_search?q=body:elasticsearch
GET hive-posts-index/_search?q=category:hive-105017