Lab 2.4: Types and parameters

Objective:

In this lab you will continue working with the blogs data to improve the functionality of the index. You can consider this entire lab as exam preparation for the Elastic Certified Engineer exam.

  1. In the previous lab, you changed the tags fields (tags.elastic_stack, tags.industry, tags.level, etc.) into keyword fields. Querying all these separate fields is possible, but not optimal. Let's copy all the individual tags fields into one search_tags field that can be queried with a simple match query. At the same time, let's also apply the content_analyzer to the content field.

    • create a new index named blogs_fixed2. Use the mapping and settings of blogs_fixed as the starting point
    • add a new keyword field to the blogs_fixed2 mapping named search_tags
    • using copy_to, copy the values of all the tags to search_tags
    • disable doc values for the new search_tags field (it will not be used for sorting or aggregations)
    • completely disable the authors.uid field
    • apply the custom content_analyzer from lab 2.3 to the content field (don't forget that the analyzer needs to be defined in the settings!)
    Solution
    PUT blogs_fixed2
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "content_analyzer": {
              "tokenizer": "standard",
              "filter": ["lowercase"],
              "char_filter": ["html_strip"]
            }
          }
        }
      },
      "mappings": {
        "_meta": {
          "created_by": "Elastic Student"
        },
        "properties": {
          "authors": {
            "properties": {
              "company": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "first_name": {
                "type": "keyword"
              },
              "full_name": {
                "type": "text"
              },
              "job_title": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "last_name": {
                "type": "keyword"
              },
              "uid": {
                "enabled": false
              }
            }
          },
          "category": {
            "type": "keyword"
          },
          "content": {
            "type": "text",
            "analyzer": "content_analyzer"
          },
          "locale": {
            "type": "keyword"
          },
          "publish_date": {
            "type": "date",
            "format": "iso8601"
          },
          "search_tags": {
            "type": "keyword",
            "doc_values": false
          },
          "tags": {
            "properties": {
              "elastic_stack": {
                "type": "keyword",
                "copy_to": "search_tags"
              },
              "industry": {
                "type": "keyword",
                "copy_to": "search_tags"
              },
              "level": {
                "type": "keyword",
                "copy_to": "search_tags"
              },
              "product": {
                "type": "keyword",
                "copy_to": "search_tags"
              },
              "tags": {
                "type": "keyword",
                "copy_to": "search_tags"
              },
              "topic": {
                "type": "keyword",
                "copy_to": "search_tags"
              },
              "use_case": {
                "type": "keyword",
                "copy_to": "search_tags"
              },
              "use_cases": {
                "type": "keyword",
                "copy_to": "search_tags"
              }
            }
          },
          "title": {
            "type": "text"
          },
          "url": {
            "type": "keyword"
          }
        }
      }
    }
    
  2. Reindex all the blogs into blogs_fixed2.

    Solution
    POST _reindex
    {
      "source": {
        "index": "blogs"
      },
      "dest": {
        "index": "blogs_fixed2"
      }
    }
    
  3. In lab 2.3, you queried the blogs index for "quot" in the content field and found 826 hits. Repeat the query on the blogs_fixed2 index:

    GET blogs_fixed2/_search
    {
      "query": {
        "match": {
          "content": "quot"
        }
      }
    }
    
    You'll now only find one hit. This blog post actually talks about HTML, so it is a good match. For the other blogs, the custom analyzer with the html_strip character filter has removed all occurrences of "quot" from the content field when the data got indexed into blogs_fixed2.

  4. Run a match query on the search_tags field for the value "logstash". You should get 381 results. Examine a few of the results to see which tag section actually contains the value logstash:

    GET blogs_fixed2/_search
    {
      "query": {
        "match": {
          "search_tags": "logstash"
        }
      }
    }
    

  5. Run the following aggregation on the search_tags field:

    GET blogs_fixed2/_search
    {
      "size": 0,
      "aggs": {
        "top_job_titles": {
          "terms": {
            "field": "search_tags",
            "size": 10
          }
        }
      }
    }
    
    The result is an error. Why?

    Solution

    The search_tags field does not have doc values enabled. As a result, you cannot aggregate on that field.

  6. Run the following aggregation on the authors.uid field:

    GET blogs_fixed2/_search
    {
      "size": 0,
      "aggs": {
        "top_author_uids": {
          "terms": {
            "field": "authors.uid",
            "size": 10
          }
        }
      }
    }
    
    Why does this aggregation not return any results?

    Solution

    The authors.uid field has been disabled. From a query and aggregation perspective, it's as if that field does not exist.

Summary:

In this lab, you explored some of the mapping parameters like copy_to and enabled.