Lab 5.1: Metrics and buckets aggregations
Objective:
In this lab, you will become familiar with writing metrics and bucket aggregations. You will use aggregations to answer some questions about the web_traffic index.
Note: The field runtime_ms for the index web_traffic is in microsecond (μs).
-
What is the runtime of the fastest request? (TIP: set size to 0.)
Solution
The fastest query took 73.0 μs to run.
GET web_traffic/_search { "size": 0, "aggs": { "fastest_request_time": { "min": { "field": "runtime_ms" } } } } -
You can use the stats aggregation when you want to calculate all the main metrics (min, max, avg, and sum). Update the aggregation above to use
statsinstead ofmin. What is the average runtime?Solution
The average runtime is approximately 495 ms
GET web_traffic/_search { "size": 0, "aggs": { "request_time_stats": { "stats": { "field": "runtime_ms" } } } } -
Median often is a better option than average, as a single result can impact the average. Calculate the median runtime and verify if 90% of the requests take less than 1 second.
Solution
You can calculate the median using the
percentileaggregation. The median runtime is approximately 395 milliseconds. This is much lower than the average. Furthermore, 90% of the requests take less than approximately 956 milliseconds.GET web_traffic/_search { "size": 0, "aggs": { "runtime_median_and_90": { "percentiles": { "field": "runtime_ms", "percents": [ 50, 90 ] } } } } -
Your SLA for the queries' runtime is 500 milliseconds. What percentage of the requests is within this time?
Solution
The
percentile_ranksaggregation allows you to provide a value and get back the percentile it represents. Approximately 64.6% of the requests take 500 milliseconds or less.GET web_traffic/_search { "size": 0, "aggs": { "runtime_goal": { "percentile_ranks": { "field": "runtime_ms", "values": [ 500000 ] } } } } -
How many distinct URLs were visited? The URL paths are stored in the
url.originalfield.Solution
You should get around 12338 distinct URLs
GET web_traffic/_search { "size": 0, "aggs": { "my_url_value_count": { "cardinality": { "field": "url.original" } } } } -
How many requests are there for each response code? The response codes are stored in the
http.response.status_codefield.Solution
GET web_traffic/_search { "size": 0, "aggs": { "status_code_buckets": { "terms": { "field": "http.response.status_code" } } } } -
By default,
termsaggregation is sorted bydoc_countChange your previous search so that itstermsare sorted alphabetically.Solution
GET web_traffic/_search { "size": 0, "aggs": { "status_code_buckets": { "terms": { "field": "http.response.status_code", "order": { "_key": "asc" } } } } } -
Write an aggregation that returns the
bytes_sentdistribution for theweb_trafficindex. Use an interval of 10000.Solution
GET web_traffic/_search { "size": 0, "aggs": { "runtime_histogram": { "histogram": { "field": "bytes_sent", "interval": 10000 } } } } -
Notice that some of the returned buckets have few documents. Update the aggregation to exclude the buckets that have less than 1000 documents.
Solution
GET web_traffic/_search { "size": 0, "aggs": { "runtime_histogram": { "histogram": { "field": "bytes_sent", "interval": 10000, "min_doc_count": 1000 } } } } -
How many requests are there for each week?
Solution
GET web_traffic/_search { "size": 0, "aggs": { "logs_by_week": { "date_histogram": { "field": "@timestamp", "calendar_interval": "week" } } } } -
Change the previous aggregation to return the number of requests per second. What happened?
Solution
The request looks like the following:
Elasticsearch returns the following error:GET web_traffic/_search { "size": 0, "aggs": { "logs_by_week": { "date_histogram": { "field": "@timestamp", "calendar_interval": "second" } } } }It means the requested aggregation would generate too many buckets. This might put the cluster in danger. Thus, Elasticsearch will not execute it. The maximum number of buckets allowed in a single response is limited by a dynamic cluster setting namedTrying to create too many buckets. Must be less than or equal to: [65536] but this number of buckets was exceeded. This limit can be set by changing the [search.max_buckets] cluster level setting.search.max_buckets. As you can see in the response, it defaults to 65536. Requests that try to return more than the limit will fail with an exception.
Summary:
In this lab, you became familiar with writing bucket and metric aggregations.