Lab 6.3: Distributed operations

Objective:

In this lab, you will perform search requests on different shards and see how Elasticsearch gathers the results.

  1. Create a new index that satisfies the following requirements:

    • the name of the index is blogs_tmp
    • has two primary shards
    • has zero replica shards
    Solution
    PUT blogs_tmp
    {
      "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 0
      }
    }
    
  2. Using the Reindex API, reindex the documents from blogs into blogs_tmp.

    Solution
    POST _reindex
    {
      "source": {
        "index": "blogs"
      },
      "dest": {
        "index": "blogs_tmp"
      }
    }
    
  3. Let's see how Elasticsearch distributed the documents. Use the _cat/shards API to examine the shards of the new index.

    Solution

    GET _cat/shards/blogs_tmp?v
    
    The number of documents should be approximately the same on each shard.

  4. Let's run a query on this new index. Perform a query with the following requirements:

    • Query only the documents of the shard zero. To do so, use the preference parameter. You can use it like this : GET blogs_tmp/_search?preference=_shards:0
    • Query the blogs that mention Agent in the content field.
    • Get only the top 3 results
    • Filter the _source to display only the title
    Solution
    GET blogs_tmp/_search?preference=_shards:0
    {
      "size": 3,
      "_source": ["title"], 
      "query": {
        "match": {
          "content": "Agent"
        }
      }
    }
    
  5. Keep track of the number of results and the top 3 blogs' name and their score.

  6. Now run the same query for the shard one.

    Solution
    GET blogs_tmp/_search?preference=_shards:1
    {
      "size": 3,
      "_source": ["title"], 
      "query": {
        "match": {
          "content": "Agent"
        }
      }
    }
    
  7. Once again, save the results somewhere so you can easily visualize them. Can you already predict what will be the final result?

  8. Finally, run the same request on the whole index (remove the preference parameter)

    Solution
    GET blogs_tmp/_search
    {
      "size": 3,
      "_source": ["title"], 
      "query": {
        "match": {
          "content": "Agent"
        }
      }
    }
    
  9. You should get the expected top 3 hits (the ones with the highest score on both shards combined). You can also verify that the total number of hits equals the sum on each shard.

Summary:

In this lab, you analyzed the anatomy of search requests. You learned how Elasticsearch combines the results from multiple shards.