IoT Edge 1.5 LTS is the supported release. IoT Edge 1.4 LTS is end of life as of November 12, 2024. If you are on an earlier release, see Update IoT Edge.
The IoT Edge runtime components, IoT Edge hub, and IoT Edge agent, produce built-in metrics in the Prometheus exposition format. Access these metrics remotely to monitor and understand the health of an IoT Edge device.
You can use your own solution to access these metrics. Or, you can use the metrics-collector module, which handles collecting the built-in metrics and sending them to Azure Monitor or Azure IoT Hub. For more information, see Collect and transport metrics.
Metrics are automatically exposed by default on port 9600 of the edgeHub and edgeAgent modules (http://edgeHub:9600/metrics and http://edgeAgent:9600/metrics). They aren't port mapped to the host by default.
Access metrics from the host by exposing and mapping the metrics port from the module's createOptions. The example below maps the default metrics port to port 9601 on the host:
Choose different and unique host port numbers if you are mapping both the edgeHub and edgeAgent's metrics endpoints.
Note
The environment variable httpSettings__enabled should not be set to false for built-in metrics to be available for collection.
Environment variables that can be used to disable metrics are listed in the azure/iotedge repo doc.
Available metrics
Metrics contain tags to help identify the nature of the metric being collected. All metrics contain the following tags:
Tag
Description
iothub
The hub the device is talking to
edge_device
The ID of the current device
instance_number
A GUID representing the current runtime. On restart, all metrics are reset. This GUID makes it easier to reconcile restarts.
In the Prometheus exposition format, there are four core metric types: counter, gauge, histogram, and summary. For more information about the different metric types, see the Prometheus metric types documentation.
The quantiles provided for the built-in histogram and summary metrics are 0.1, 0.5, 0.9 and 0.99.
The edgeHub module produces the following metrics:
Name
Dimensions
Description
edgehub_gettwin_total
source (operation source) id (module ID)
Type: counter Total number of GetTwin calls
edgehub_messages_received_total
route_output (output that sent message) id
Type: counter Total number of messages received from clients
edgehub_messages_sent_total
from (message source) to (message destination) from_route_output to_route_input (message destination input) priority (message priority to destination)
Type: counter Total number of messages sent to clients or upstream to_route_input is empty when to is $upstream
edgehub_reported_properties_total
target(update target) id
Type: counter Total reported property updates calls
edgehub_message_size_bytes
id
Type: summary Message size from clients Values may be reported as NaN if no new measurements are reported for a certain period of time (currently 10 minutes); for summary type, corresponding _count and _sum counters are emitted.
edgehub_gettwin_duration_seconds
source id
Type: summary Time taken for get twin operations
edgehub_message_send_duration_seconds
from to from_route_output to_route_input
Type: summary Time taken to send a message
edgehub_message_process_duration_seconds
from to priority
Type: summary Time taken to process a message from the queue
Type: gauge Current length of edgeHub's queue for a given priority
edgehub_messages_dropped_total
reason (no_route, ttl_expiry) from from_route_output
Type: counter Total number of messages removed because of reason
edgehub_messages_unack_total
reason (storage_failure) from from_route_output
Type: counter Total number of messages unacknowledged because storage failure
edgehub_offline_count_total
id
Type: counter Total number of times edgeHub went offline
edgehub_offline_duration_seconds
id
Type: summary Time edge hub was offline
edgehub_operation_retry_total
id operation (operation name)
Type: counter Total number of times edgeHub operations were retried
edgehub_client_connect_failed_total
id reason (not authenticated)
Type: counter Total number of times clients failed to connect to edgeHub
The edgeAgent module produces the following metrics:
Name
Dimensions
Description
edgeAgent_total_time_running_correctly_seconds
module_name
Type: gauge The amount of time the module was specified in the deployment and was in the running state
edgeAgent_total_time_expected_running_seconds
module_name
Type: gauge The amount of time the module was specified in the deployment
edgeAgent_module_start_total
module_name, module_version
Type: counter Number of times edgeAgent asked docker to start the module
edgeAgent_module_stop_total
module_name, module_version
Type: counter Number of times edgeAgent asked docker to stop the module
edgeAgent_command_latency_seconds
command
Type: gauge How long it took docker to execute the given command. Possible commands are: create, update, remove, start, stop, and restart
edgeAgent_iothub_syncs_total
Type: counter Number of times edgeAgent attempted to sync its twin with iotHub, both successful and unsuccessful. This number includes both Agent requesting a twin and Hub notifying of a twin update
edgeAgent_unsuccessful_iothub_syncs_total
Type: counter Number of times edgeAgent failed to sync its twin with iotHub.
edgeAgent_deployment_time_seconds
Type: counter The amount of time it took to complete a new deployment after receiving a change.
edgeagent_direct_method_invocations_count
method_name
Type: counter Number of times a built-in edgeAgent direct method is called, such as Ping or Restart.
edgeAgent_host_uptime_seconds
Type: gauge How long the host has been on
edgeAgent_iotedged_uptime_seconds
Type: gauge How long iotedged has been running
edgeAgent_available_disk_space_bytes
disk_name, disk_filesystem, disk_filetype
Type: gauge Amount of space left on the disk
edgeAgent_total_disk_space_bytes
disk_name, disk_filesystem, disk_filetype
Type: gauge Size of the disk
edgeAgent_used_memory_bytes
module_name
Type: gauge Amount of RAM used by all processes
edgeAgent_total_memory_bytes
module_name
Type: gauge RAM available
edgeAgent_used_cpu_percent
module_name
Type: histogram Percent of cpu used by all processes
edgeAgent_created_pids_total
module_name
Type: gauge The number of processes or threads the container has created
edgeAgent_total_network_in_bytes
module_name
Type: gauge The number of bytes received from the network
edgeAgent_total_network_out_bytes
module_name
Type: gauge The number of bytes sent to network
edgeAgent_total_disk_read_bytes
module_name
Type: gauge The number of bytes read from the disk
Type: gauge General metadata about the device. The value is always 0, information is encoded in the tags. Note experimental_features and host_information are json objects. host_information looks like {"OperatingSystemType": "linux", "Architecture": "x86_64", "Version": "1.2.7", "Provisioning": {"Type": "dps.tpm", "DynamicReprovisioning": false, "AlwaysReprovisionOnStartup": false}, "ServerVersion": "20.10.11+azure-3", "KernelVersion": "5.11.0-1027-azure", "OperatingSystem": "Ubuntu 20.04.4 LTS", "NumCpus": 2, "Virtualized": "yes"}. Note ServerVersion is the Docker version and Version is the IoT Edge security daemon version.
This module is theoretical and will walk you through how an IoT Edge device can be harnessed as a gateway to child devices and can store captured information if connectivity is unavailable.