Lab 4.2: Enriching data

Objective:

In this lab, you will add a new field to the blogs_fixed2 index. This field is populated by using the enrich processor to perform a lookup of data in a separate index.

  1. Run a terms aggregation on the category field of the blogs_fixed2 index. Notice you get 5 values that are not human-friendly. They are unique IDs that are designed to map to data in another data source.

    Solution
    GET blogs_fixed2/_search
    {
      "size": 0,
      "aggs": {
        "NAME": {
          "terms": {
            "field": "category",
            "size": 10
          }
        }
      }
    }
    
  2. Let's create a new index that maps the ID to its actual category name. Run the following _bulk command, which creates a new index named categories:

    POST categories/_bulk
    {"create":{}}
    {"uid": "blt26ff0a1ade01f60d","title":"User Stories"}
    {"create":{}}
    {"uid": "bltfaae4466058cc7d6","title": "Releases"}
    {"create":{}}
    {"uid": "bltc253e0851420b088","title": "Culture"}
    {"create":{}}
    {"uid": "blt0c9f31df4f2a7a2b","title": "News"}
    {"create":{}}
    {"uid": "blt1d90b8e0edce3ea9","title": "Engineering"}
    

  3. EXAM PREP: Create an enrich policy that satisfies the following requirements.

    • the name of the policy is categories_policy
    • the match field is the uid field of the categories index
    • the enrich field is the title field
    Solution
    PUT _enrich/policy/categories_policy
    {
      "match": {
        "indices": "categories",
        "match_field": "uid",
        "enrich_fields": ["title"]
      }
    }
    
  4. Execute the policy to create an enrich index.

    Solution
    POST _enrich/policy/categories_policy/_execute
    
  5. EXAM PREP: Create a new ingest pipeline that satisfies the following requirements:

    • the name of the pipeline is categories_pipeline
    • uses an enrich processor with the categories_policy policy. Maps the existing category field to the enrich policy and enriches a new field named category_title
    • removes the original category field
    • both the enrich processor and remove processor should ignore documents that don't have a category field
    Solution

    Use the Ingest Node Pipeline UI to define the pipeline in Kibana. If you want to skip that step you can copy-and-paste the following PUT command into Console:

    PUT _ingest/pipeline/categories_pipeline
    {
      "processors": [
        {
          "enrich": {
            "field": "category",
            "policy_name": "categories_policy",
            "target_field": "category_title",
            "ignore_missing": true
          }
        },
        {
          "remove": {
            "field": "category",
            "ignore_missing": true
          }
        }
      ]
    }
    
  6. Add object category_title with fields title and uid (both of type keyword) to the blogs_fixed2 index mapping.

    Solution
    PUT blogs_fixed2/_mapping
    {
      "properties": {
        "category_title": {
          "properties": {
            "title": {
              "type": "keyword"
            },
            "uid": {
              "type": "keyword"
            }
          }
        }
      }
    }
    
  7. Using _update_by_query, run all the documents in blogs_fixed2 through your categories_pipeline.

    POST blogs_fixed2/_update_by_query?pipeline=categories_pipeline&wait_for_completion=false
    

  8. Run a terms aggregation on the category_title.title field and verify you enriched the index.

    GET blogs_fixed2/_search
    {
      "size": 0,
      "aggs": {
        "blogs_by_category": {
          "terms": {
            "field": "category_title.title",
            "size": 10
          }
        }
      }
    }
    

Summary:

In this lab, you also learned how to enrich an index with data from another index using an enrich policy and the enrich processor.