Metrics
Module for collecting application metrics using Micrometer.
Requires private HTTP server module added to provide metrics in prometheus format.
Dependency¶
Dependency build.gradle:
Module:
Dependency build.gradle.kts:
Module:
Configuration¶
Example of HTTP server path configuration for retrieving metrics described in the HttpServerConfig class (default values are specified):
- Path to get metrics in
prometheusformat (if HTTP server module is added):
- Path to get metrics in
prometheusformat (if HTTP server module is added):
Example of the complete configuration described in the MetricsConfig class (default values are specified):
Metrics collection configuration parameters are described in modules where metrics collection is present, e.g. HTTP server, HTTP client, etc.
Usage¶
We follow and encourage to use the notation described in the specification.
Once the Metrics.globalRegistry module is connected, the PrometheusMeterRegistry will be registered and used in all components that collect metrics.
Personalization¶
In order to make changes to the PrometheusMeterRegistry configuration, you need to add to the PrometheusMeterRegistryInitializer container.
Important, PrometheusMeterRegistryInitializer is applied only once when the application is initialized.
For example, we want to add a common tag for all metrics:
Standard metrics have some configurations such as ServiceLayerObjectives for Distribution summary metrics.
The configuration field names can be viewed in ru.tinkoff.kora.micrometer.module.MetricsConfig.
Standard¶
The original metrics format used the OpenTelemetry V120 standard, after Kora 1.1.0 it became possible to provide metrics
in the OpenTelemetry V123 standard, a partial list of changes can be seen in the OpenTelemetry documentation
and OpenTelemetry migration guidelines
Metrics Reference¶
All Kora metrics use OpenTelemetry semantic conventions for naming and tags.
Micrometer metric types used:
- DistributionSummary — used for collecting distributions of arbitrary values. This metric type enables efficient data visualization across buckets and percentile calculation.
- Counter — monotonically increasing counter
- Gauge — current metric value
HTTP Server¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
http.server.request.duration |
http_server_request_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | HTTP server request processing duration | http.request.method, http.response.status_code, http.route, url.scheme, server.address, error.type |
http.server.active_requests |
http_server_active_requests |
Gauge | Number of active HTTP requests | http.request.method, http.route, server.address, url.scheme |
See HTTP Server module documentation for more details.
HTTP Client¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
http.client.request.duration |
http_client_request_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | HTTP client request duration | http.request.method, http.response.status_code, server.address, url.scheme, http.route, error.type |
See HTTP Client module documentation for more details.
Database¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
db.client.request.duration |
db_client_request_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Database operation/query duration | db.pool.name, db.statement, db.operation, error.type |
See Database module documentation for more details.
Kafka¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
messaging.receive.duration |
messaging_receive_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Single message processing duration | messaging.system, messaging.destination, messaging.operation, error.type |
messaging.publish.duration |
messaging_publish_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Message send duration | messaging.system, messaging.destination, messaging.partition_id, error.type |
messaging.process.batch.duration |
messaging_process_batch_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Message batch processing duration | messaging.system, messaging.destination, error.type |
messaging.kafka.consumer.lag |
messaging_kafka_consumer_lag |
Gauge | Consumer lag per partition | messaging.system, messaging.destination, messaging.partition_id, messaging.consumer_group |
See Kafka module documentation for more details.
gRPC Server¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
rpc.server.duration |
rpc_server_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | gRPC server call processing duration | rpc.service, rpc.method, rpc.status, error.type |
rpc.server.requests_per_rpc |
rpc_server_requests_per_rpc_total |
Counter | Number of requests received per RPC | rpc.service, rpc.method |
rpc.server.responses_per_rpc |
rpc_server_responses_per_rpc_total |
Counter | Number of responses sent per RPC | rpc.service, rpc.method |
See gRPC Server module documentation for more details.
gRPC Client¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
rpc.client.duration |
rpc_client_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | gRPC client call duration | rpc.service, rpc.method, rpc.status, error.type, server.address |
rpc.client.requests_per_rpc |
rpc_client_requests_per_rpc_total |
Counter | Number of requests sent per RPC | rpc.service, rpc.method, server.address |
rpc.client.responses_per_rpc |
rpc_client_responses_per_rpc_total |
Counter | Number of responses received per RPC | rpc.service, rpc.method, server.address |
See gRPC Client module documentation for more details.
SOAP Client¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
rpc.client.duration |
rpc_client_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | SOAP client call duration | rpc.system, rpc.service, rpc.method, rpc.result, server.address, server.port |
See SOAP Client module documentation for more details.
Scheduling¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
scheduling.job.duration |
scheduling_job_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Scheduled job execution duration | code.class, code.function, error.type |
See Scheduling module documentation for more details.
Cache¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
cache.duration |
cache_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Cache operation duration (GET, SET, DELETE, etc.) | cache, operation, origin, status |
cache.ratio |
cache_ratio_total |
Counter | Cache hit/miss counter | cache, origin, type |
Standard Micrometer metrics are automatically registered when using Caffeine:
| Metric | Prometheus | Type | Description |
|---|---|---|---|
cache.gets |
cache_gets_total |
Counter | Number of cache requests |
cache.puts |
cache_puts_total |
Counter | Number of cache writes |
cache.evictions |
cache_evictions_total |
Counter | Number of cache evictions |
cache.size |
cache_size |
Gauge | Current cache size |
See Cache module documentation for more details.
Redis / Lettuce¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
lettuce.command.completion.duration |
lettuce_command_completion_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Redis command completion duration | type, remote, local, command, error.type |
lettuce.command.firstresponse.duration |
lettuce_command_firstresponse_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Redis command first response duration | type, remote, local, command, error.type |
Resilience¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
resilient.circuitbreaker.state |
resilient_circuitbreaker_state |
Gauge | Circuit breaker state (0=CLOSED, 1=HALF_OPEN, 2=OPEN) | name |
resilient.circuitbreaker.transition |
resilient_circuitbreaker_transition_total |
Counter | Circuit breaker state transitions | name, state |
resilient.circuitbreaker.call.acquire |
resilient_circuitbreaker_call_acquire_total |
Counter | Circuit breaker call acquire attempts/rejections | name, state, status |
resilient.retry.attempts |
resilient_retry_attempts_total |
Counter | Number of retry attempts | name |
resilient.retry.exhausted |
resilient_retry_exhausted_total |
Counter | Number of exhausted retries | name |
resilient.timeout.exhausted |
resilient_timeout_exhausted_total |
Counter | Number of timeouts | name |
resilient.fallback.attempts |
resilient_fallback_attempts_total |
Counter | Number of fallback invocations | name, type |
See Resilience module documentation for more details.
JMS¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
messaging.receive.duration |
messaging_receive_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | JMS message receive duration | messaging.system, messaging.destination.name, error.type |
S3 Client¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
s3.client.duration |
s3_client_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | S3 HTTP request duration | aws.s3.bucket, aws.operation.name, error.type |
s3.kora.client.duration |
s3_kora_client_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Kora S3 client operation duration | aws.client.name, aws.s3.bucket, aws.operation.name, error.type |
See S3 Client module documentation for more details.
Camunda 7 BPMN¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
camunda.engine.delegate.duration |
camunda_engine_delegate_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Camunda BPMN Java delegate execution duration | delegate, business.key, error.type |
camunda.engine.delegate.active_requests |
camunda_engine_delegate_active_requests |
Gauge | Number of active delegate executions | delegate, business.key |
See Camunda 7 BPMN module documentation for more details.
Camunda REST¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
camunda.rest.server.request.duration |
camunda_rest_server_request_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Camunda REST request duration | http.request.method, http.response.status_code, http.route, url.scheme, server.address, error.type |
camunda.rest.server.active_requests |
camunda_rest_server_active_requests |
Gauge | Number of active Camunda REST requests | http.route, http.request.method, server.address, url.scheme |
See Camunda 7 REST module documentation for more details.
Camunda 8 Worker¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
zeebe.worker.handler.duration |
zeebe_worker_handler_duration_milliseconds / _count / _sum / _bucket / _max |
DistributionSummary | Zeebe worker job handler duration | job.name, job.type, status, error, error.code |
zeebe.worker.handler |
zeebe_worker_handler_total |
Counter | Zeebe worker error counter | job.name, job.type, status, error.code |
zeebe.client.worker.job |
zeebe_client_worker_job_total |
Counter | Number of activated/handled Zeebe jobs | action, type |
See Camunda 8 Worker module documentation for more details.
System¶
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
kora.up |
kora_up |
Gauge | Framework status indicator (value = 1) | version |
JVM¶
Standard JVM metrics are collected automatically via Micrometer:
| Metric | Prometheus | Type | Description | Tags |
|---|---|---|---|---|
jvm.gc.pause |
jvm_gc_pause_milliseconds / _count / _sum / _max |
DistributionSummary | GC pause duration | action, cause |
jvm.gc.memory.allocated |
jvm_gc_memory_allocated_bytes_total |
Counter | Allocated memory size | — |
jvm.gc.memory.promoted |
jvm_gc_memory_promoted_bytes_total |
Counter | Memory promoted to old gen | — |
jvm.gc.max.data.size |
jvm_gc_max_data_size_bytes |
Gauge | Max old gen size | — |
jvm.gc.live.data.size |
jvm_gc_live_data_size_bytes |
Gauge | Old gen size after full GC | — |
jvm.memory.used |
jvm_memory_used_bytes |
Gauge | Used memory | area, id |
jvm.memory.committed |
jvm_memory_committed_bytes |
Gauge | Committed JVM memory | area, id |
jvm.memory.max |
jvm_memory_max_bytes |
Gauge | Max available memory | area, id |
jvm.threads.live |
jvm_threads_live_threads |
Gauge | Number of live threads | — |
jvm.threads.daemon |
jvm_threads_daemon_threads |
Gauge | Number of daemon threads | — |
jvm.threads.peak |
jvm_threads_peak_threads |
Gauge | Peak thread count | — |
jvm.threads.states |
jvm_threads_states_threads |
Gauge | Thread count by state | state |
process.cpu.usage |
process_cpu_usage |
Gauge | Process CPU usage | — |
system.cpu.usage |
system_cpu_usage |
Gauge | System CPU usage | — |
system.cpu.count |
system_cpu_count |
Gauge | Number of available processors | — |
logback.events |
logback_events_total |
Counter | Logging event count | level |
jvm.classes.loaded |
jvm_classes_loaded_classes |
Gauge | Number of loaded classes | — |
jvm.classes.unloaded |
jvm_classes_unloaded_classes_total |
Counter | Number of unloaded classes | — |
process.files.open |
process_files_open_files |
Gauge | Number of open file descriptors | — |
process.files.max |
process_files_max_files |
Gauge | Max file descriptors | — |
process.uptime |
process_uptime_milliseconds |
Gauge | Process uptime | — |