Monitor Decodable system health and activity

Decodable logs audit events and metrics about itself in the _events and _metrics streams. You can use these events and metrics to monitor user activity, health, and the state of your Decodable resources.

In general, the _events stream helps you answer questions like "Who deactivated this pipeline, and when?" It also contains state transition information, which helps you answer questions like "Were there any connectivity issues while this connection was running?"

The _metrics stream can help you answer questions like “Are we using a large enough task size for the throughput that our connection is experiencing?” With the information contained in these two streams, you can get a holistic view of your Decodable system at large.

You can preview these streams in Decodable or use a connector to send these events and metrics into downstream databases or observability systems like Elasticsearch or Datadog. Like other Decodable streams, you can also build pipelines that read data from these streams to run aggregations or set up trigger alerts based on specific conditions in the data. For more details about the type of information that you can find in the _events and _metrics stream, see System Events and Metrics for more information.

In addition, you can get a high-level view of the status of Decodable pipelines and connections from their respective Decodable Web pages. You can also view additional details about specific pipelines and connections, such as the amount of data that it’s receiving and what’s connected to it, by selecting the pipeline or connection name.

Connection and pipeline statuses and what they mean

The following table describes the various statuses that a connection or pipeline can have and what they mean. You can see the status of a connection or pipeline from the Connections or Pipelines pages in Decodable Web or by running the Decodable CLI command decodable connections get <connection_id> or decodable pipelines get <pipeline_id>. Every connection and pipeline has an Actual State and Target State status assigned to it, where Actual State refers to the current state of the resource and Target State refers to the desired end state.

The following table describes all of the statuses that a Decodable resource can have. Note that there are only two statuses that can appear as a Target State: Running and Stopped.

Status	Description
Running	The resource is active and connected to a Decodable stream. When a connection or pipeline has this status, it’s ready to receive, process, or send data.
Stopped	The resource isn’t active. It might be connected to a Decodable stream, but it’s not actively sending data through it.
Starting	The resource is preparing to enter the running state but isn’t running yet.
Pausing	The resource is preparing to enter the paused state as the pause condition is met. This state is only applicable when Scale to Zero is configured
Paused	The resource is paused and will be resumed based on the resume condition. This state is only applicable when Scale to Zero is configured.
Stopping	The resource is preparing to be stopped.
Retrying	Decodable is actively attempting to restart the resource from the latest checkpoint due to an incident. Various factors can lead to this situation. For example, invalid data encountered by a source connection or temporary unavailability of an internal Decodable component. Decodable tries to restore the impacted resource until you explicitly instruct it to cease. An error message with details is shown in the detailed view of the specified resource
Failed	Something is wrong with the resource and no data is being processed through it. There are many different reasons why this can occur. Manual intervention is required in order to restore it. Contact Decodable support for assistance with the detailed error message shown.

Status

Description

Running

The resource is active and connected to a Decodable stream. When a connection or pipeline has this status, it’s ready to receive, process, or send data.

Stopped

The resource isn’t active. It might be connected to a Decodable stream, but it’s not actively sending data through it.

Starting

The resource is preparing to enter the running state but isn’t running yet.

Pausing

The resource is preparing to enter the paused state as the pause condition is met. This state is only applicable when Scale to Zero is configured

Paused

The resource is paused and will be resumed based on the resume condition. This state is only applicable when Scale to Zero is configured.

Stopping

The resource is preparing to be stopped.

Retrying

Decodable is actively attempting to restart the resource from the latest checkpoint due to an incident. Various factors can lead to this situation. For example, invalid data encountered by a source connection or temporary unavailability of an internal Decodable component. Decodable tries to restore the impacted resource until you explicitly instruct it to cease.

An error message with details is shown in the detailed view of the specified resource

Failed

Something is wrong with the resource and no data is being processed through it. There are many different reasons why this can occur. Manual intervention is required in order to restore it. Contact Decodable support for assistance with the detailed error message shown.

View data flow information about a connection or pipeline

Every Decodable connection and pipeline measures data flow metrics as it processes data. You can view the Overview page for a specific connection or pipeline to view what streams are connected to it, how many tasks are assigned to it, and how much data is flowing through. Use the Overview page to review the inbound and outbound data metrics of your Decodable resource and confirm whether data is flowing through as expected.

You can also view all available metrics for every active connection and pipeline in the Decodable account by previewing the _metrics stream. See System Events and Metrics for more information about the available metrics and metric properties in the _metrics stream.

To access the Overview page of a specific resource, do the following:

Navigate to the Connections page or Pipelines page, depending on which resource you want to see metrics for.
Select the row that represents the resource that you want to inspect. The Overview page for that resource opens.

The Overview page of a Decodable resource displays the following data flow information. These metrics are shown in real-time, so the numbers fluctuate over time.

Metrics are reset upon each activation. Stopping and restarting a connection or pipeline resets metrics back to zero. Metrics are also reset if a connection or pipeline enters a retrying state.

Resource	Metrics displayed
Source Connections	The amount of data that this connection is sending to the connected stream. This is measured in both bytes per second as well as records per second. The total amount of data that this connection has sent to the connected stream since it was last activated or restarted. This is measured in both bytes per second as well as records per second. Note: Output metrics for the REST Connector and the Datagen Connector aren’t supported.
Sink Connections	The amount of data that this connection is receiving from the connected stream. This is measured in both bytes per second as well as records per second. The total amount of data that this connection has received from the connected stream since it was last activated or restarted. This is measured in both bytes per second as well as records per second. The total number of records that are ready for processing, but haven’t been processed. If this metric is showing a high number of records, you may want to increase the task count for the connection.
Pipelines	The number of streams that are connected to this pipeline. You can select whether you want to see input metrics for all streams connected to this pipeline, or just input metrics for a specific stream connected to this pipeline. The amount of outbound data that the pipeline is sending to a stream. By comparing this metric to the inbound data that the pipeline is receiving, you can see how much data is being filtered out by the pipeline. The total number of records that are ready for processing, but haven’t been processed. If this metric is showing a high number of records, you may want to increase the task count for the pipeline.

Resource

Metrics displayed

Source Connections

The amount of data that this connection is sending to the connected stream. This is measured in both bytes per second as well as records per second. The total amount of data that this connection has sent to the connected stream since it was last activated or restarted. This is measured in both bytes per second as well as records per second.

Note: Output metrics for the REST Connector and the Datagen Connector aren’t supported.

Sink Connections

The amount of data that this connection is receiving from the connected stream. This is measured in both bytes per second as well as records per second. The total amount of data that this connection has received from the connected stream since it was last activated or restarted. This is measured in both bytes per second as well as records per second. The total number of records that are ready for processing, but haven’t been processed. If this metric is showing a high number of records, you may want to increase the task count for the connection.

Pipelines

The number of streams that are connected to this pipeline. You can select whether you want to see input metrics for all streams connected to this pipeline, or just input metrics for a specific stream connected to this pipeline. The amount of outbound data that the pipeline is sending to a stream. By comparing this metric to the inbound data that the pipeline is receiving, you can see how much data is being filtered out by the pipeline. The total number of records that are ready for processing, but haven’t been processed. If this metric is showing a high number of records, you may want to increase the task count for the pipeline.

Monitor job progress

Flink periodically creates checkpoints (default: every 10 seconds) for fault tolerance and recovery when failure happens. By monitoring checkpoints_completed and checkpoints_failed metrics, you can determine whether the job is progressing as expected.

Under normal conditions, the checkpoints_completed metric should increase steadily for a running job. If the value stalls and checkpoints_failed is increasing, it indicates the job is unable to make progress and requires troubleshooting.

Note that temporary increases in checkpoints_failed may occur during initial job initialization which can take a few minutes or internal job restarts, and no intervention is required in these cases.

Detecting processing lag

You should monitor pipeline metrics, and in particular the records_lag_total metric. If this value continues to increase then it suggests that the pipeline can’t keep up with the rate at which messages are being produced. To resolve this, increase the number of tasks and/or task size on the pipeline.

Send Decodable metrics to Datadog for enhanced observability

Datadog is a monitoring and observability platform, enabling you to keep track of all aspects of your Cloud environment in one centralized location. Decodable provides an out-of-the-box Datadog connector, making it even easier for you to monitor your Decodable accounts in your Datadog dashboards. With Decodable’s Datadog connector, you can send data from Decodable’s _metrics stream which is where Decodable writes metrics about the data that it’s processing. You can also send metrics from your own custom metrics stream to the Datadog Connector, as long as the metrics stream has a metric name, metric value, and timestamp columns or fields. The Datadog Connector also supports adding tags to the data, either for all metrics, or on a per-metric basis.

See the following pages for more information.

For information on how to create a connection to Datadog, see Datadog Connector.
For information about what information is in Decodable’s _metrics stream, see System Events and Metrics.