Lab 4.3: Runtime fields

Objective:

In this lab, you will learn how to write Painless scripts. You will also define runtime fields.

  1. One way you can use Painless is in a script query. Run the following script query, which is really just a "match all" query:

    GET blogs_fixed2/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "script": {
                "script": """
                  return true;
                """
              }
            }
          ]
        }
      }
    }
    

  2. Now write a script query that returns all blogs where the url of the blog is greater than or equal to 100 characters. Use the length() function of the value of the url field to determine the number of characters. You should get 37 hits.

    Solution
    GET blogs_fixed2/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "script": {
                "script": """
                  return doc['url'].value.length() >= 100;
                """
              }
            }
          ]
        }
      }
    }
    
  3. Write a script query that returns all blogs where the authors.last_name field starts with the letter "K". There are multiple ways to write this code, but you can use the startsWith() function. You should get 433 hits. HINT: The authors field is an array, you need to iterate through all the elements.

    Solution

    There are certainly different ways to write this query, but here is a solution:

    GET blogs_fixed2/_search
    {
      "_source": "authors", 
      "query": {
        "bool": {
          "filter": [
            {
              "script": {
                "script": """
                  def authors = doc["authors.last_name"];
                  for (int i = 0; i < authors.size(); i++) {
                    if (authors.get(i).startsWith("K")) {
                      return true;
                    }
                  }
                  return false;
                """
              }
            }
          ]
        }
      }
    }
    

  4. Write a script query that returns blogs where the number of product values in the tags object contains at least 3 values. You should get 616 hits.

    Solution
    GET blogs_fixed2/_search
    {
      "query": {
        "bool": {
          "filter": [
            {
              "script": {
                "script": """
                  return doc['tags.product'].size() >= 3;
                """
              }
            }
          ]
        }
      }
    }
    
  5. OPTIONAL: If you want more practice with Painless take a look at the Painless Lab. Go to the Painless Lab in Dev Tools: "Painless Lab"

  6. OPTIONAL: Play with the sample code by replacing all the asterisks * in the smiley with zeros 0 and replace the dots . with ones 1. "Painless Lab"

  7. Suppose you want to determine which day of the week had the most blog postings. Currently, we do not have which day of the week a blog was posted - only the publish_date field in an ISO date format. If this is not an aggregation we will run often, we can use a runtime field to do the calculation. Run a search on blogs_fixed2 that satisfies the following requirements:

    • define a runtime_mappings named day_of_week of type keyword
    • use the following Painless code to calculate the day of the week from the publish_date field:
      doc['publish_date'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT)
      
    • contains a terms aggregation on the day_of_week field

    You should see that Wednesday is the most popular day to publish a blog, followed closely by Tuesday.

    Solution
    GET blogs_fixed2/_search
    {
      "size": 0,
      "runtime_mappings": {
        "day_of_week": {
          "type": "keyword",
          "script": {
            "source": "emit(doc['publish_date'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
          }
        }
      },
      "aggs": {
        "top_days": {
          "terms": {
            "field": "day_of_week"
          }
        }
      }
    }
    
  8. Another handy use of runtime fields is to temporarily access a field that was disabled during indexing. (No Painless is required here!) To see the issue, try running the following aggregation. Notice there are no results, because the authors.uid field is not indexed:

    GET blogs_fixed2/_search
    {
      "size": 0,
      "aggs": {
        "top_uids": {
          "terms": {
            "field": "authors.uid"
          }
        }
      }
    }
    

  9. Using runtime fields, you can temporarily index an existing field by simply referring to that field in the runtime_mappings section. Run the following search, which causes the authors.uid field to be indexed as a keyword just for the execution of this search request:

    GET blogs_fixed2/_search
    {
      "size": 0,
      "runtime_mappings": {
        "authors.uid": {
          "type": "keyword"
        }
      },
      "aggs": {
        "top_uids": {
          "terms": {
            "field": "authors.uid"
          }
        }
      }
    }
    

  10. You can use runtime fields to change the mapping of a field just for a specific search request. For example, the authors.full_name field is currently text, but you can still search it as a keyword field. Write a search on blogs that queries the authors.full_name field as a keyword field for the value "Jongmin Kim".

    Solution

    GET blogs_fixed2/_search
    {
      "runtime_mappings": {
        "authors.full_name": {
          "type": "keyword"
        }
      },
      "query": {
        "match": {
          "authors.full_name": "Jongmin Kim"
        }
      }
    }
    
    Try running the same query without the runtime mapping to see the difference.

Summary:

In this lab, you wrote some Painless code within script queries. You also saw how to optimize Elasticsearch for more efficient storage using runtime fields.