Lab 4.2: Enriching data

Objective:

In this lab, you will add a new field to the blogs_fixed2 index. This field is populated by using the enrich processor to perform a lookup of data in a separate index.

Run a terms aggregation on the category field of the blogs_fixed2 index. Notice you get 5 values that are not human-friendly. They are unique IDs that are designed to map to data in another data source.

Solution

GET blogs_fixed2/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "category",
        "size": 10
      }
    }
  }
}

Let's create a new index that maps the ID to its actual category name. Run the following _bulk command, which creates a new index named categories:

POST categories/_bulk
{"create":{}}
{"uid": "blt26ff0a1ade01f60d","title":"User Stories"}
{"create":{}}
{"uid": "bltfaae4466058cc7d6","title": "Releases"}
{"create":{}}
{"uid": "bltc253e0851420b088","title": "Culture"}
{"create":{}}
{"uid": "blt0c9f31df4f2a7a2b","title": "News"}
{"create":{}}
{"uid": "blt1d90b8e0edce3ea9","title": "Engineering"}

EXAM PREP: Create an enrich policy that satisfies the following requirements.
- the name of the policy is categories_policy
- the match field is the uid field of the categories index
- the enrich field is the title field
Solution
```
PUT _enrich/policy/categories_policy
{
  "match": {
    "indices": "categories",
    "match_field": "uid",
    "enrich_fields": ["title"]
  }
}
```
Execute the policy to create an enrich index.
Solution
```
POST _enrich/policy/categories_policy/_execute
```
EXAM PREP: Create a new ingest pipeline that satisfies the following requirements:
- the name of the pipeline is categories_pipeline
- uses an enrich processor with the categories_policy policy. Maps the existing category field to the enrich policy and enriches a new field named category_title
- removes the original category field
- both the enrich processor and remove processor should ignore documents that don't have a category field
Solution

Use the Ingest Node Pipeline UI to define the pipeline in Kibana. If you want to skip that step you can copy-and-paste the following PUT command into Console:
```
PUT _ingest/pipeline/categories_pipeline
{
  "processors": [
    {
      "enrich": {
        "field": "category",
        "policy_name": "categories_policy",
        "target_field": "category_title",
        "ignore_missing": true
      }
    },
    {
      "remove": {
        "field": "category",
        "ignore_missing": true
      }
    }
  ]
}
```

Add object category_title with fields title and uid (both of type keyword) to the blogs_fixed2 index mapping.

Solution

PUT blogs_fixed2/_mapping
{
  "properties": {
    "category_title": {
      "properties": {
        "title": {
          "type": "keyword"
        },
        "uid": {
          "type": "keyword"
        }
      }
    }
  }
}

Using _update_by_query, run all the documents in blogs_fixed2 through your categories_pipeline.

POST blogs_fixed2/_update_by_query?pipeline=categories_pipeline&wait_for_completion=false

Run a terms aggregation on the category_title.title field and verify you enriched the index.

GET blogs_fixed2/_search
{
  "size": 0,
  "aggs": {
    "blogs_by_category": {
      "terms": {
        "field": "category_title.title",
        "size": 10
      }
    }
  }
}

Summary:

In this lab, you also learned how to enrich an index with data from another index using an enrich policy and the enrich processor.