Manage schemas

A schema defines the structure of the records flowing through Decodable. It specifies the fields that the records contain and the data types of those fields. Decodable uses schemas to validate that the data being ingested adheres to the expected format. This means that the schema of a stream must match the schema of the connection that it is attached to.

You can view the schema of a connection or stream by doing the following.

  1. Navigate to the Streams or Connections page.

  2. Select the stream or connection that you want to view the schema for. The Overview page for that resource opens.

  3. Select the Schema tab to view the schema for the resource.

How to use the schema view

Use the Schema page to define the schema that your data has. Schemas help ensure that the data flowing through Decodable is consistent and accurate even while it’s being processed and handled by different systems and applications.

If you want to perform a schema migration or make any changes to a schema that is attached to an actively running connection, make sure that you review Manage schemas topic before continuing.

The following screen image and accompanying table describe the various components found on the Schema page.

A screenshot of part of the schema management page with numbers highlighting features described in the table below

Number

Name

Description

1

Partition Key Fields / Primary Key Fields

One or more fields to use as either the partition key or the primary key.

Partition Key: A field that helps determine how your data should be partitioned. If you do not specify a partition key, then records are distributed randomly over all stream partitions. See Stream Types for more information.

Primary Key: A field that contains a value that can be used to uniquely identify each record. See Change Record Schema for more information.

2

Name

The field name.

3

Type

The field type. For a list of supported data types, see Decodable data types.

4

Partition Key / Primary Key

Select this icon if you want to use this field as either the partition key or primary key. If you want to use the field as a primary key, you will also need to specify that the type is not null explicitly by entering: <type> NOT NULL. For example: BIGINT NOT NULL.

5

Watermark

A field that can be used to track the progression of event time in a pipeline. You can also include an optional interval to allow for late arriving data. The field must be one of the following types: TIMESTAMP(3), TIMESTAMP(2), TIMESTAMP(1), TIMESTAMP_LTZ(3), TIMESTAMP_LTZ(2), or TIMESTAMP_LTZ(1).

To specify a watermark in the Decodable CLI, use the --watermark flag when you’re defining your schema:

  • --watermark "timestamp_field AS timestamp_field"

  • --watermark "timestamp_field AS timestamp_field - INTERVAL '0.001' SECOND"

  • --watermark "timestamp_field AS timestamp_field - INTERVAL '5' MINUTE"

The first example assumes strict ordering of data. In other words, there is no allowance for late-arriving data, and any records arriving late are discarded.

The second and third examples include a grace period to accommodate late arriving data.

For a full example of specifying a watermark using the Decodable CLI, see the Pipelines example in the Using the Decodable CLI Tutorial.

6

Data Type Options

One of the following:

Physical: Use this option when the field and the field’s type are physically present in the record. This is the most common option.

Computed: Use this option if you want to compute the field value by entering an expression. The expression can include any supported functions and can reference other fields in the schema. Computed fields allow you to derive a new field based on existing data. For example, if you want a field to contain the time in which the record was processed by Decodable, you can enter the expression PROCTIME().

Metadata: Use this option if you are using a connector that supports additional metadata fields. Connectors that supported this option are: Apache Kafka and Amazon Kinesis.

7

Delete

Removes the field from the stream schema.

8

Import Schema

Infer the schema based on a subset of data. Select this button if you want to upload a sample of your JSON or AVRO-formatted data and infer the schema based on the provided sample.

+