Lab 8.3: Optimizing search performance

Objective:

In this lab, you will examine ways to improve the performance of searches using the Profile API. You will also enhance the relevance of your queries.

  1. Start by running a slow query in Console :

    GET blogs_fixed2/_search
    {
      "profile": true, 
      "_source": [""],
      "query": {
        "function_score": {
          "query": {
            "match_all": {}
          },
          "script_score": {
            "script": """
              void slow() {
                for (int x = 0; x < 10000; ++x) 
                  Math.log(x);
              } 
    
              for (int x = 0; x < 3; ++x) 
                slow();
            """
          }
        }
      }
    }
    

  2. The output of profile will be below the hits section of the output. Notice it is somewhat complicated to read. Let's look at a clearer view with the Search Profiler.

  3. View the Search Profiler page in Kibana (click the tab to the right of Console in Dev Tools). Set the index to blogs_fixed2 and set the body of the query to the following, then click the Profile button:

    {
      "query": {
        "function_score": {
          "query": {
            "match_all": {}
          },
          "script_score": {
            "script": """
            void slow() {
              for (int x = 0; x < 10000; ++x) 
                Math.log(x);
            } 
    
            for (int x = 0; x < 3; ++x) 
              slow();
            """
          }
        }
      }
    }
    

  4. Using the Search Profiler, see if you can determine the "slow" part of the query (which we already know is the long for loop.)

  5. Let's profile another search - this time one that combines aggregations and queries. Set the index to blogs_fixed2 and set the body of the query to the following:

    {
      "query": {
        "bool": {
          "must": [
            {"match": {
              "title": "logstash"
            }}
          ],
          "must_not": [
            {"match": {
              "search_tags": "kibana"
            }}
          ],
          "should": [
            {"match_phrase": {
              "content": "dead letter queue"
            }}
          ]
        }
      },
      "aggs": {
        "author": {
          "terms": {
            "field": "authors.last_name"
          }
        }
      }
    }
    

  6. Each section of the search is broken down into the component tasks in the output. Which part of this search is taking the longest? How might this be improved?

    Solution

    Chances are, it's the dead letter queue phrase query. Since it's in a should clause, we could replace it with match and probably get similar results. Recall that a "should" only impacts the score and, therefore, the order of the results, not which results are actually returned. Try a change and see!

  7. Next, run the following query that searches for the term boosting among the fields titleand content :

    GET blogs_fixed2/_search
    {
      "_source": [
        "title"
      ],
      "query": {
        "multi_match": {
          "query": "boosting",
          "fields": [
            "content",
            "title"
          ]
        }
      }
    }
    

  8. Analyze the results closely. You should notice that the blogs with the term boosting in their title don't always appear first.

  9. Update the previous query to give the title field a higher weight (1.4).

    Solution

    GET blogs_fixed2/_search
    {
      "_source": [
        "title"
      ],
      "query": {
        "multi_match": {
          "query": "boosting",
          "fields": [
            "content",
            "title^1.4"
          ]
        }
      }
    }
    
    The three blogs with the term boosting in their title should rank higher.

  10. EXAM PREP: By default, Elasticsearch uses the maximum score from the two field to compute the final score. Update the previous query to use the sum of the field scores instead of using the default best_fields.

    Solution
    GET blogs_fixed2/_search
    {
      "_source": [
        "title"
      ],
      "query": {
        "multi_match": {
          "type": "most_fields", 
          "query": "boosting",
          "fields": [
            "content", 
            "title^1.4"
          ]
        }
      }
    }
    

Summary:

In this lab, you saw how to profile inefficient searches to identify where to improve the search efficiency. You also learned how to improve the relevance of your queries.