Web Quickstart Guide

Decodable is a real-time stream processing platform that collects data from sources, processes that data using SQL, and delivers data to one or more destinations of your choice. This guide steps you through an end-to-end example of how to use Decodable to parse Envoy logs in real-time with the goal of getting you familiar with the platform so that you can get rolling with your own use case. We’ll assume that you’ve already created your account and logged in successfully. For your convenience, we've created some resources in your account to get you started.

In this guide, you will perform the following steps:

  • Send data from a connector to a data stream.
  • Create a pipeline that processes data from a data stream, parses it into a format where it can be used, and sends that data to a different data stream.
  • Create a second pipeline that performs aggregation on the data stream, and sends the aggregated results to a downstream destination.

Create a connection to get data into a stream

Connections read data from an external system to a Decodable stream or send data from a Decodable stream to an external system. In this section, you'll start a pre-created connection to get data into a stream.

First, select the Connections tab to view all the connections in your account. You will see one connection that has been pre-created for you.

This connection uses the datagen connector, which generates test data. This connector is a source connector, meaning that it reads data from an external system into a stream. Let's activate the connection to get some data flowing. Select '...' to open the more options menu, and then select the Start button to activate the connection.

You’ll soon see the Actual State of the connection transition to 'Running'. At this point, the test data is outputting to a stream called envoy_raw which has been pre-created for you.

The data will have a field value with a type of STRING. The content emulates raw http event logs in JSON format as in the example below:

{"value":"[2021-11-05T04:48:03Z] \"GET /products/3 HTTP/1.1\" 500 URX 2001 6345 82 32 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64)\" \"5c70092a-ed05-4d0c-9d2b-f7bf361ad17e\" \"localhost\" \"192.168.0.11:443\""}
{"value":"[2021-11-05T04:48:03Z] \"DELETE /users/1 HTTP/2.0\" 200 NC 4044 3860 41 39 \"-\" \"Mozilla/5.0 (Linux; Android 10) \" \"5ca9fd79-afee-44db-9352-2ee9949dc6df\" \"aws.gateway\" \"10.0.0.1\""}
{"value":"[2021-11-05T04:48:03Z] \"DELETE /products/2 HTTP/2.0\" 500 UH 3826 8831 14 33 \"-\" \"Mozilla/5.0 (Linux; Android 10) \" \"5f0ae73d-c76b-471f-9458-3efc45128509\" \"aws.gateway\" \"10.0.0.1\""}

Now that we have some data flowing, let’s see what we can do with it. Select the datagen_envoy_connection:

Select Outbound to 1 stream - envoy_raw to view the stream we are connected to.

Preview your incoming data by previewing the stream

You will now see the envoy_raw stream overview with a preview of the data. Above the sample data, you can see the inbound and outbound source and destination of the stream. In this case, the inbound data for the envoy_raw stream comes from the datagen_envoy_connection and there is currently no outbound destination configured for this stream. Select '...' next to No outputs to open the more options menu, and then select Create a pipeline to open the pipeline editor.

Create a pipeline to parse your data into structured records

Now we’re ready to build our pipeline. A pipeline is a SQL query that processes data from one or more input streams and writes the results to an output stream.

In the Pipeline Editor, you'll see three panes: a Data Catalog pane, a SQL pane, and a Preview pane. The SQL pane is where you can author your pipeline SQL. A simple INSERT query has been pre-constructed for you based on the input stream.

You can use the Preview pane to visually inspect what your data looks like before it's sent to another stream. You can use the preview feature to get real-time visibility into the contents of your streams, and it also provides you a way to iterate on your pipeline configuration so that your data ends up in the destination in the format that you expect. Select Run Preview to preview your pipeline. The results should appear in under a minute.

You should see JSON records with a single field named value. Its values are plaintext log entries of API activity in Envoy’s logging format:

In order to make use of these logs, we’ll first need to parse them. Try copy-pasting the following SQL statement into the SQL pane and select Run Preview again.

-- Extract Envoy fields from a map as top level fields and insert them into the
-- http_events stream.
insert into http_events

with
mapped as (
select
grok(
`value`,
'\[%{TIMESTAMP_ISO8601:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} %{DATA:version}" %{NUMBER:status}'
) as fields
from envoy_raw
)

select
to_timestamp(fields['timestamp'], 'yyyy-MM-dd''T''HH:mm:ss''Z''') AS `timestamp`,
fields['method'] as `method`,
fields['request'] as `request`,
fields['version'] as `version`,
cast(fields['status'] as int) as status

from mapped

Now, you'll see that the JSON has been parsed into structured records, and some interesting fields have been extracted as top-level fields in your data. Select Next:

The name of the output stream and the names and types of the fields in the stream’s schema are based on the earlier SQL statement. Give the stream a description of your choosing like 'parsed Envoy logs', and select Create Stream and then Next:

Give this new pipeline a name and description of your choosing

Go ahead and click Create Pipeline:

You’ll be brought to the pipeline overview page, showing the pipeline in a Stopped state.

Once a pipeline has been created, you need to start it. Select Start and wait for the pipeline's state to switch from Stopped to Running. Once a pipeline is running, the infrastructure has been provisioned and data is flowing. You'll also see pipeline metrics start to appear as well.

Create a pipeline that summarizes data

Now that you have data flowing through Decodable that is properly parsed, let's add a second pipeline to your workflow that aggregates the parsed data and returns a count of the API activity. The source of this second pipeline will be thehttp_events stream which contains the data that you parsed in the previous step. Select Outbound to 1 stream - http_events to switch to the detailed view of the http_events stream. We will create another pipeline to aggregate the data from this stream.

The first step in performing a time-based aggregation is to define a field to use for a watermark. A watermark is a timestamp that is associated with each record in the stream and used to track the progress of event time for stream of data. Watermarks enable time-based operations, such as windowing and time-based aggregations, to work correctly. See Watermarks in the Managing Streams topic for more information. Watermarks must be defined in the input stream's schema, so let's choose which field to use for the watermark before creating the pipeline.

To define a watermark, select Schema and then select Watermark. In the Event Time Field, enter timestamp. In this view, you can also specify how long you'd like to wait for late-arriving records after the end of a window. In this tutorial, we'll use a window of 15 seconds and a Max Lateness of 10 seconds. This means that data with a timestamp of up to 10 seconds after the end of the window will still be included in the aggregation. Make sure to select Save to persist your changes.

Select Overview to return to the details page for the http_events stream. Then, select '...' next to No outputs to open the more options menu, and select Create a pipeline to open the pipeline editor. We'll now create a second pipeline that performs an aggregation on this data.

In the pipeline editor, try copy-pasting the following SQL statement into the SQL pane and select Run Preview again.

insert into aggregated_events
select window_start, window_end, `method`, count(*) AS `count`
from table(
    tumble(table http_events, descriptor(`timestamp`), interval '15' seconds))
group by window_start, window_end, `method`

You should see a count of how many times a specific method was invoked, within a 15-second time window. Select Next to give this pipeline a name and a description.

When you are ready, select Create Pipeline. Once again, you'll need to start your pipeline after creating it. Select Start and wait for the pipeline's state to switch from Stopped to Running.

Conclusion

The last step in any Decodable workflow is to create a connection to the destination that you want to send your data to. This tutorial skips this step. However, once you are ready to incorporate Decodable into your data workflows, remember that you'll need to create a connection between the appropriate stream and the desired data destination by using a sink connector. See the topics in the Connector Reference for more information about which destinations you can send data to.

Congratulations! You've reached the end of the tutorial. You are now ready to build and run your own pipelines using our test data, or configure connections to your own data infrastructure to get started in earnest.

In this tutorial, you have:

  • Sent data from a source connector to a stream via a connection.
  • Create a pipeline that processes data from a data stream, parses it into a format where it can be used, and sends that data to a different data stream.
  • Created a second pipeline that performs aggregation on the data stream, and sends the aggregated results to a downstream destination.