Lab 2.2: Overview of mappings
Objective:
In this lab, you are going to work through the process of defining a custom mapping.
-
Index the following sample document, which also creates a new index called
sample_blog:POST sample_blog/_doc { "@timestamp": "2021-03-10T16:00:00.000Z", "abstract": "The Joy of Painting", "author": "Bob Ross", "body": "Painting should do one thing. It should put happiness in your heart. We'll take a little bit of Van Dyke Brown. Isn't that fantastic? You can just push a little tree out of your brush like that. Mix your color marbly don't mix it dead.", "body_word_count": 55, "category": "Painting", "title": "Making Happy Little Trees", "url": "/blog/happy-little-trees", "published": true } -
View the default mappings that were created. Elasticsearch did its best to guess the data types - but notice a lot of the fields are of type
textandkeyword:GET sample_blog/_mapping -
Create a new index called
test_blogsbased on thesample_blogmapping. Configuretest_blogsto satisfy the following requirements:@timestampis adatebody_word_countis an integer- the
abstract,body, andtitlefields are of typetextonly - the
author,categoryandurlfields are of typekeywordonly publishedis of typeboolean
Solution
PUT test_blogs { "mappings": { "properties": { "@timestamp": { "type": "date" }, "abstract": { "type": "text" }, "author": { "type": "keyword" }, "body": { "type": "text" }, "body_word_count": { "type": "integer" }, "category": { "type": "keyword" }, "title": { "type": "text" }, "url": { "type": "keyword" }, "published": { "type": "boolean" } } } } -
Index the document from step 1 into your new
test_blogsindex. The document should be indexed without any issues with the mapping or data types.Solution
POST test_blogs/_doc { "@timestamp": "2021-03-10T16:00:00.000Z", "abstract": "The Joy of Painting", "author": "Bob Ross", "body": "Painting should do one thing. It should put happiness in your heart. We'll take a little bit of Van Dyke Brown. Isn't that fantastic? You can just push a little tree out of your brush like that. Mix your color marbly don't mix it dead.", "body_word_count": 55, "category": "Painting", "title": "Making Happy Little Trees", "utl": "/blog/happy-little-trees", "published": "true" } -
Now let's make some changes to the
blogsindex. First, create a newblogs_fixedindex.PUT blogs_fixed -
View the current mapping of
blogs:GET blogs/_mapping -
The mappings for
blogsare closer to the kind of optimizations we like to see. For instance, notice that shorter string fields likecategoryare mapped askeywordonly, while longer string fields likecontentare mapped astextonly. -
The mapping is not perfect though. For example, the
authorandtagsobjects contain fields that are mapped as bothtextandkeyword. This is not very efficient, as most of these fields will not need both. Let's make some changes in a new index mapping.
Add the existing mappings fromblogsto theblogs_fixedindex, but do not run the command as you will modify the mapping first:PUT blogs_fixed/_mapping { "_meta" : { "created_by" : "ml-file-data-visualizer" }, "properties" : { "authors" : { "properties" : { "company" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "first_name" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "full_name" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "job_title" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "last_name" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "uid" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } }, "category" : { "type" : "keyword" }, "content" : { "type" : "text" }, "locale" : { "type" : "keyword" }, "publish_date" : { "type" : "date", "format" : "iso8601" }, "tags" : { "properties" : { "elastic_stack" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "industry" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "level" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "product" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "tags" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "topic" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "use_case" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "use_cases" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } }, "title" : { "type" : "text" }, "url" : { "type" : "keyword" } } } -
EXAM PREP: Modify the mapping for
blogs_fixedso that is satisfies the following requirements:- change the
_meta.created_byfield to be your name, This mapping will be defined by you. - in the
authorsobject, thefirst_name,last_nameanduidfields arekeywordonly - in the
authorsobject, thefull_namefield istextonly - all fields within the
tagsobject arekeywordonly
Solution
PUT blogs_fixed/_mapping { "_meta": { "created_by": "Elastic Student" }, "properties": { "authors": { "properties": { "company": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "first_name": { "type": "keyword" }, "full_name": { "type": "text" }, "job_title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "last_name": { "type": "keyword" }, "uid": { "type": "keyword" } } }, "category": { "type": "keyword" }, "content": { "type": "text" }, "locale": { "type": "keyword" }, "publish_date": { "type": "date", "format": "iso8601" }, "tags": { "properties": { "elastic_stack": { "type": "keyword" }, "industry": { "type": "keyword" }, "level": { "type": "keyword" }, "product": { "type": "keyword" }, "tags": { "type": "keyword" }, "topic": { "type": "keyword" }, "use_case": { "type": "keyword" }, "use_cases": { "type": "keyword" } } }, "title": { "type": "text" }, "url": { "type": "keyword" } } } - change the
-
Now you can run the request to add the new mapping you've just defined to
blogs_fixedindex. Theblogs_fixedindex does not have any data in it yet. You will index data into it next. -
Reindex all of the documents from the
blogsindex into your newblogs_fixedindex.Solution
POST _reindex { "source": { "index": "blogs" }, "dest": { "index": "blogs_fixed" } }The reindex request will take a few moments, but should run fairly quickly. If it times out, do not panic and do not run the reindex command again. It just means it took more than 1 minute, and Console stopped waiting for the response. The request will continue to run in the background though.
-
Run the following command to see how many documents are in
blogs_fixed. You will know the reindexing is complete whenblogs_fixedhas 4,719 documents.GET blogs_fixed/_count -
Let's confirm the changes worked with a simple test. Run these two queries in Console:
The first query should have 3 hits, while the second will have zero. This is because queries onGET blogs/_search { "query": { "match": { "authors.first_name": "kim" } } } GET blogs_fixed/_search { "query": { "match": { "authors.first_name": "kim" } } }keywordfields are case-sensitive, while queries ontextfields are case-insensitive. -
Search for "Kim" (capital K) as the
authors.first_nameinblogs_fixedand you should get 3 hits. -
Search for blogs with "security analytics" as a value in
tags.use_casein theblogs_fixedindex. You should get 216 hits:GET blogs_fixed/_search { "query": { "match": { "tags.use_case": "security analytics" } } } -
Run the previous query for "security analytics" on the original
blogsindex. You should get 598 hits. Why are there so many more hits?Solution
The
tags.use_casefield in theblogsindex is of typetext. Searching for "security analytics" is a search for "security" or "analytics". You are finding blogs that also have "business analytics" as a use case.
Thetags.use_casefield in theblogs_fixedindex is of typekeyword. When you ran the same search on theblogs_fixedindex, you found blogs that have the exact use case of "security analytics". This demonstrates the difference between thetextandkeywordfields.
Summary:
In this lab, you created a new index that improves on the mapping from the original blogs mapping. You will still have some improvements to make on blogs, but for now you should be familiar with the process of defining a mapping for an index.