Lab 6.3: Distributed operations
Objective:
In this lab, you will perform search requests on different shards and see how Elasticsearch gathers the results.
-
Create a new index that satisfies the following requirements:
- the name of the index is
blogs_tmp - has two primary shards
- has zero replica shards
Solution
PUT blogs_tmp { "settings": { "number_of_shards": 2, "number_of_replicas": 0 } } - the name of the index is
-
Using the Reindex API, reindex the documents from
blogsintoblogs_tmp.Solution
POST _reindex { "source": { "index": "blogs" }, "dest": { "index": "blogs_tmp" } } -
Let's see how Elasticsearch distributed the documents. Use the
_cat/shardsAPI to examine the shards of the new index.Solution
The number of documents should be approximately the same on each shard.GET _cat/shards/blogs_tmp?v -
Let's run a query on this new index. Perform a query with the following requirements:
- Query only the documents of the shard zero. To do so, use the preference parameter. You can use it like this :
GET blogs_tmp/_search?preference=_shards:0 - Query the blogs that mention
Agentin thecontentfield. - Get only the top 3 results
- Filter the
_sourceto display only thetitle
Solution
GET blogs_tmp/_search?preference=_shards:0 { "size": 3, "_source": ["title"], "query": { "match": { "content": "Agent" } } } - Query only the documents of the shard zero. To do so, use the preference parameter. You can use it like this :
-
Keep track of the number of results and the top 3 blogs' name and their score.
-
Now run the same query for the shard one.
Solution
GET blogs_tmp/_search?preference=_shards:1 { "size": 3, "_source": ["title"], "query": { "match": { "content": "Agent" } } } -
Once again, save the results somewhere so you can easily visualize them. Can you already predict what will be the final result?
-
Finally, run the same request on the whole index (remove the
preferenceparameter)Solution
GET blogs_tmp/_search { "size": 3, "_source": ["title"], "query": { "match": { "content": "Agent" } } } -
You should get the expected top 3 hits (the ones with the highest score on both shards combined). You can also verify that the total number of hits equals the sum on each shard.
Summary:
In this lab, you analyzed the anatomy of search requests. You learned how Elasticsearch combines the results from multiple shards.