Your document collection is big!
Scan through all the documents every time you search for something?
Pre-process the documents and create an index!
How well each document matches the query
Github uses Elasticsearch to search 20TB data, including 1.3 billion files and 130 billion code lines
With filtering, aggregations, highlightning, pagination...
Count things and summarize your data, lots of data, often on timestamped data!
Logs > Logstash > Elasticsearch > Kibana
Commonly used in addition to another database...
wget https://download.elasticsearch.org/elasticsearch/release/...
tar -zxvf elasticsearch-2.2.0.tar.gz
cd elasticsearch-2.2.0/bin
./elasticsearch.sh
You can access it at http://localhost:9200 on your web browser, which returns this:
{
"status":200,
"name":"Cypher",
"cluster_name":"elasticsearch",
"version":{
"number":"1.5.2",
"build_hash":"62ff9868b4c8a0c45860bebb259e21980778ab1c",
"build_timestamp":"2015-04-27T09:21:06Z",
"build_snapshot":false,
"lucene_version":"4.10.4"
},
"tagline":"You Know, for Search"
}
JSON documents!
{
"title": "Elasticsearch Worshop",
"date": "2016-04-08"
}
The act of storing data in Elasticsearch is called indexing.
$curl -X POST localhost:9200/books/computer/1 --data
'{
"name": "The Pragmatic Programmer",
"category": "Programming",
"price": 29.90
}'
$curl -X POST localhost:9200/books/computer/2 --data
'{
"name": "Clean Code",
"category": "Programming",
"price": 14.90
}'
$curl -X POST localhost:9200/books/computer/3 --data
'{
"name": "Working Effectively with Legacy Code",
"category": "Refactoring",
"price": 45.50
}'
$curl -X GET localhost:9200/books/computer/1
Result:
{
"_index": "books",
"_type": "computer",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"name": "The Pragmatic Programmer",
"category": "Programming",
"price": 29.9
}
}
$curl -X PUT localhost:9200/books/computer/1 --data
'{
"name":"The Awesome Programmer"
}'
Result:
{
"_index":"books",
"_type":"computer",
"_id":"1",
"_version":2,
"created":false
}
$curl -X DELETE localhost:9200/books/computer/1
Find all books that contains the word "code"
$curl -X GET localhost:9200/books/computer/_search?q=code
Sorted by relevance!
{
"took":6,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":2,
"max_score":0.15342641,
"hits":[
{
"_index":"books",
"_type":"computer",
"_id":"2",
"_score":0.15342641,
"_source":{
"name":"Clean Code",
"category":"Programming",
"price":14.9
}
},
{
"_index":"books",
"_type":"computer",
"_id":"3",
"_score":0.11506981,
"_source":{
"name":"Working Effectively with Legacy Code",
"category":"Refactoring",
"price":45.5
}
}
]
}
}
Mapping is used to define how a document, and the fields it contains, are stored and indexed.
This is similar to a database schema.
Define the data types of the document fields
{
"mappings": {
"book": {
"properties": {
"name": {
"type": "string"
},
"category": {
"type": "string",
"index": "not_analyzed"
},
"price": {
"type": "float"
}
}
}
}
}
Find the books with a name that contains the word "code"
$ curl -XGET ‘localhost:9200/books/book/_search’ -d
'{
"query": {
"match": {
"name": "code"
}
}
}'
Find books belonging to the "Programming" category
{
"query": {
"term": {
"category": "Programming"
}
}
}
Find books belonging to the "Programming" category, while skipping relevance scoring
{
"query": {
"bool": {
"filter": [
{ "term": { "category": "Programming" } }
]
}
}
}
Query | Filter |
---|---|
Full text search | Exact match |
Relevance scoring | Binary yes/no |
Relatively slow | Fast |
Not cacheable | Cacheable |
Or combine them: Filter first, then query remaining docs.
Metric | Bucket |
---|---|
Min | Range |
Max | Terms |
Sum | Histogram |
Avg | |
Stats |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 tasks - learning Query DSL
The data that are used during the workshop is a list of pizzas, with the mapping
Feature: Topic of the task
// Use https://www.elastic.co/guide/en/...
Scenario: Description of the task
Given all pizzas are indexed
When I make a query
"""
{ todo }
"""
Then the response should contain
"""
{ subset }
"""
Correct
When I make a query
"""
{
...
}
"""
Wrong
When I make a query
"""{
...
}
"""
Total
{
"workshop": "Elasticsearch",
"date" : "2016-04-08"
}
Subset
{
"date" : "2016-04-08"
}