Lab 2.2: Overview of mappings

Objective:

In this lab, you are going to work through the process of defining a custom mapping.

  1. Index the following sample document, which also creates a new index called sample_blog:

    POST sample_blog/_doc
    {
      "@timestamp": "2021-03-10T16:00:00.000Z",
      "abstract": "The Joy of Painting",
      "author": "Bob Ross",
      "body": "Painting should do one thing. It should put happiness in your heart. We'll take a little bit of Van Dyke Brown. Isn't that fantastic? You can just push a little tree out of your brush like that. Mix your color marbly don't mix it dead.",
      "body_word_count": 55,
      "category": "Painting",
      "title": "Making Happy Little Trees",
      "url": "/blog/happy-little-trees",
      "published": true
    }
    

  2. View the default mappings that were created. Elasticsearch did its best to guess the data types - but notice a lot of the fields are of type text and keyword:

    GET sample_blog/_mapping
    

  3. Create a new index called test_blogs based on the sample_blog mapping. Configure test_blogs to satisfy the following requirements:

    • @timestamp is a date
    • body_word_count is an integer
    • the abstract, body, and title fields are of type text only
    • the author, category and url fields are of type keyword only
    • published is of type boolean
    Solution
    PUT test_blogs
    {
      "mappings": {
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "abstract": {
            "type": "text"
          },
          "author": {
            "type": "keyword"
          },
          "body": {
            "type": "text"
          },
          "body_word_count": {
            "type": "integer"
          },
          "category": {
            "type": "keyword"
          },
          "title": {
            "type": "text"
          },
          "url": {
            "type": "keyword"
          },
          "published": {
            "type": "boolean"
          }
        }
      }
    }
    
  4. Index the document from step 1 into your new test_blogs index. The document should be indexed without any issues with the mapping or data types.

    Solution
    POST test_blogs/_doc
    {
      "@timestamp": "2021-03-10T16:00:00.000Z",
      "abstract": "The Joy of Painting",
      "author": "Bob Ross",
      "body": "Painting should do one thing. It should put happiness in your heart. We'll take a little bit of Van Dyke Brown. Isn't that fantastic? You can just push a little tree out of your brush like that. Mix your color marbly don't mix it dead.",
      "body_word_count": 55,
      "category": "Painting",
      "title": "Making Happy Little Trees",
      "utl": "/blog/happy-little-trees",
      "published": "true"
    }
    
  5. Now let's make some changes to the blogs index. First, create a new blogs_fixed index.

    PUT blogs_fixed
    

  6. View the current mapping of blogs:

    GET blogs/_mapping
    

  7. The mappings for blogs are closer to the kind of optimizations we like to see. For instance, notice that shorter string fields like category are mapped as keyword only, while longer string fields like content are mapped as text only.

  8. The mapping is not perfect though. For example, the author and tags objects contain fields that are mapped as both text and keyword. This is not very efficient, as most of these fields will not need both. Let's make some changes in a new index mapping.
    Add the existing mappings from blogs to the blogs_fixed index, but do not run the command as you will modify the mapping first:

    PUT blogs_fixed/_mapping
    {
      "_meta" : {
        "created_by" : "ml-file-data-visualizer"
      },
      "properties" : {
        "authors" : {
          "properties" : {
            "company" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "first_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "full_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "job_title" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "last_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "uid" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "category" : {
          "type" : "keyword"
        },
        "content" : {
          "type" : "text"
        },
        "locale" : {
          "type" : "keyword"
        },
        "publish_date" : {
          "type" : "date",
          "format" : "iso8601"
        },
        "tags" : {
          "properties" : {
            "elastic_stack" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "industry" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "level" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "product" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "tags" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "topic" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "use_case" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "use_cases" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "title" : {
          "type" : "text"
        },
        "url" : {
          "type" : "keyword"
        }
      }
    }
    

  9. EXAM PREP: Modify the mapping for blogs_fixed so that is satisfies the following requirements:

    • change the _meta.created_by field to be your name, This mapping will be defined by you.
    • in the authors object, the first_name, last_name and uid fields are keyword only
    • in the authors object, the full_name field is text only
    • all fields within the tags object are keyword only
    Solution
    PUT blogs_fixed/_mapping
    {
      "_meta": {
        "created_by": "Elastic Student"
      },
      "properties": {
        "authors": {
          "properties": {
            "company": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "first_name": {
              "type": "keyword"
            },
            "full_name": {
              "type": "text"
            },
            "job_title": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "last_name": {
              "type": "keyword"
            },
            "uid": {
              "type": "keyword"
            }
          }
        },
        "category": {
          "type": "keyword"
        },
        "content": {
          "type": "text"
        },
        "locale": {
          "type": "keyword"
        },
        "publish_date": {
          "type": "date",
          "format": "iso8601"
        },
        "tags": {
          "properties": {
            "elastic_stack": {
              "type": "keyword"
            },
            "industry": {
              "type": "keyword"
            },
            "level": {
              "type": "keyword"
            },
            "product": {
              "type": "keyword"
            },
            "tags": {
              "type": "keyword"
            },
            "topic": {
              "type": "keyword"
            },
            "use_case": {
              "type": "keyword"
            },
            "use_cases": {
              "type": "keyword"
            }
          }
        },
        "title": {
          "type": "text"
        },
        "url": {
          "type": "keyword"
        }
      }
    }
    
  10. Now you can run the request to add the new mapping you've just defined to blogs_fixed index. The blogs_fixed index does not have any data in it yet. You will index data into it next.

  11. Reindex all of the documents from the blogs index into your new blogs_fixed index.

    Solution
    POST _reindex
    {
      "source": {
        "index": "blogs"
      },
      "dest": {
        "index": "blogs_fixed"
      }
    }
    

    The reindex request will take a few moments, but should run fairly quickly. If it times out, do not panic and do not run the reindex command again. It just means it took more than 1 minute, and Console stopped waiting for the response. The request will continue to run in the background though.

  12. Run the following command to see how many documents are in blogs_fixed. You will know the reindexing is complete when blogs_fixed has 4,719 documents.

    GET blogs_fixed/_count
    

  13. Let's confirm the changes worked with a simple test. Run these two queries in Console:

    GET blogs/_search
    {
      "query": {
        "match": {
          "authors.first_name": "kim"
        }
      }
    }
    
    GET blogs_fixed/_search
    {
      "query": {
        "match": {
          "authors.first_name": "kim"
        }
      }
    }
    
    The first query should have 3 hits, while the second will have zero. This is because queries on keyword fields are case-sensitive, while queries on text fields are case-insensitive.

  14. Search for "Kim" (capital K) as the authors.first_name in blogs_fixed and you should get 3 hits.

  15. Search for blogs with "security analytics" as a value in tags.use_case in the blogs_fixed index. You should get 216 hits:

    GET blogs_fixed/_search
    {
      "query": {
        "match": {
          "tags.use_case": "security analytics"
        }
      }
    }
    

  16. Run the previous query for "security analytics" on the original blogs index. You should get 598 hits. Why are there so many more hits?

    Solution

    The tags.use_case field in the blogs index is of type text. Searching for "security analytics" is a search for "security" or "analytics". You are finding blogs that also have "business analytics" as a use case.
    The tags.use_case field in the blogs_fixed index is of type keyword. When you ran the same search on the blogs_fixed index, you found blogs that have the exact use case of "security analytics". This demonstrates the difference between the text and keyword fields.

Summary:

In this lab, you created a new index that improves on the mapping from the original blogs mapping. You will still have some improvements to make on blogs, but for now you should be familiar with the process of defining a mapping for an index.