Lab 2.4: Types and parameters
Objective:
In this lab you will continue working with the blogs data to improve the functionality of the index. You can consider this entire lab as exam preparation for the Elastic Certified Engineer exam.
-
In the previous lab, you changed the
tagsfields (tags.elastic_stack,tags.industry,tags.level, etc.) intokeywordfields. Querying all these separate fields is possible, but not optimal. Let's copy all the individualtagsfields into onesearch_tagsfield that can be queried with a simplematchquery. At the same time, let's also apply thecontent_analyzerto thecontentfield.- create a new index named
blogs_fixed2. Use the mapping and settings ofblogs_fixedas the starting point - add a new
keywordfield to theblogs_fixed2mapping namedsearch_tags - using
copy_to, copy the values of all thetagstosearch_tags - disable doc values for the new
search_tagsfield (it will not be used for sorting or aggregations) - completely disable the
authors.uidfield - apply the custom
content_analyzerfrom lab 2.3 to thecontentfield (don't forget that the analyzer needs to be defined in the settings!)
Solution
PUT blogs_fixed2 { "settings": { "analysis": { "analyzer": { "content_analyzer": { "tokenizer": "standard", "filter": ["lowercase"], "char_filter": ["html_strip"] } } } }, "mappings": { "_meta": { "created_by": "Elastic Student" }, "properties": { "authors": { "properties": { "company": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "first_name": { "type": "keyword" }, "full_name": { "type": "text" }, "job_title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "last_name": { "type": "keyword" }, "uid": { "enabled": false } } }, "category": { "type": "keyword" }, "content": { "type": "text", "analyzer": "content_analyzer" }, "locale": { "type": "keyword" }, "publish_date": { "type": "date", "format": "iso8601" }, "search_tags": { "type": "keyword", "doc_values": false }, "tags": { "properties": { "elastic_stack": { "type": "keyword", "copy_to": "search_tags" }, "industry": { "type": "keyword", "copy_to": "search_tags" }, "level": { "type": "keyword", "copy_to": "search_tags" }, "product": { "type": "keyword", "copy_to": "search_tags" }, "tags": { "type": "keyword", "copy_to": "search_tags" }, "topic": { "type": "keyword", "copy_to": "search_tags" }, "use_case": { "type": "keyword", "copy_to": "search_tags" }, "use_cases": { "type": "keyword", "copy_to": "search_tags" } } }, "title": { "type": "text" }, "url": { "type": "keyword" } } } } - create a new index named
-
Reindex all the
blogsintoblogs_fixed2.Solution
POST _reindex { "source": { "index": "blogs" }, "dest": { "index": "blogs_fixed2" } } -
In lab 2.3, you queried the
blogsindex for "quot" in thecontentfield and found 826 hits. Repeat the query on theblogs_fixed2index:You'll now only find one hit. This blog post actually talks about HTML, so it is a good match. For the other blogs, the custom analyzer with theGET blogs_fixed2/_search { "query": { "match": { "content": "quot" } } }html_stripcharacter filter has removed all occurrences of "quot" from thecontentfield when the data got indexed intoblogs_fixed2. -
Run a
matchquery on thesearch_tagsfield for the value "logstash". You should get 381 results. Examine a few of the results to see which tag section actually contains the valuelogstash:GET blogs_fixed2/_search { "query": { "match": { "search_tags": "logstash" } } } -
Run the following aggregation on the
search_tagsfield:The result is an error. Why?GET blogs_fixed2/_search { "size": 0, "aggs": { "top_job_titles": { "terms": { "field": "search_tags", "size": 10 } } } }Solution
The
search_tagsfield does not have doc values enabled. As a result, you cannot aggregate on that field. -
Run the following aggregation on the
authors.uidfield:Why does this aggregation not return any results?GET blogs_fixed2/_search { "size": 0, "aggs": { "top_author_uids": { "terms": { "field": "authors.uid", "size": 10 } } } }Solution
The
authors.uidfield has been disabled. From a query and aggregation perspective, it's as if that field does not exist.
Summary:
In this lab, you explored some of the mapping parameters like copy_to and enabled.