Terminology

Accounts

An account is an isolated tenant in Decodable. It exists within a single cloud provider and region, but an account can connect to data services anywhere. All Decodable resources are scoped within an account.

Accounts have one or more identity providers (IDPs) which determine from where logins are allowed.

Append stream

An append stream is a type of data stream in which new records are continuously added to the end of the stream. In other words, an append stream is a stream that is only ever appended to and each individual record in a change stream is an independent event. The default retention time for append streams is 1 week.

Related terms

Change stream

A change stream is a type of data stream that captures changes to a database in real-time. Records in a change stream represent a change made to a database, such as updates, inserts, and deletes. A record in a change stream is called a Change Data Capture (CDC) record. Change streams have an indefinite retention policy.

Related terms

Checkpoint

A checkpoint is a mechanism that enables fault tolerance and guarantees exactly-once processing for records in pipelines.

Connections

A connection is a configuration of a connector. Connections contain the identification details and credentials that Decodable uses to access data sources or data destinations. You must create a connection to get data from a source into a stream or send data from a stream to a destination.

Connector

A connector is the part of the Decodable platform that connects Decodable with a data source or destination. They provide integration with your applications, streaming systems, storage systems, and databases.

Some connectors are data sources, some are data destinations, and some other ones can be used as both a data source or a data destination. A source connection receives data from an upstream system and sends it to a stream, while a sink connection reads from a stream and sends it to a downstream system.

See Connect to a data source for a list of available sources and Connect to a data destination for a list of available destinations.

Delivery guarantees

A "delivery guarantee" refers to the assurance that data processed by Decodable will be delivered to its intended destination(s) in a reliable and consistent manner, without loss or duplication. The delivery guarantee varies depending on the connector that you are using to receive or send data. There are two types of delivery guarantees.

  • At-least-once delivery: With this delivery guarantee, Decodable ensures that each event in a data stream is processed at least once, even if failures occur during processing. This guarantee ensures that all data is eventually delivered, but it might result in duplicate events being delivered in certain situations.

  • Exactly-once delivery: With this delivery guarantee, Decodable ensures that each event in a data stream is processed exactly once, without duplicates or losses.

Refer to the specific connector topic for the exact delivery guarantee.

Event time

Event time refers to the time at which an event actually occurred in the real world or when the event originated from an upstream system. The event time is present in the record itself and is usually in a field called timestamp or similar.

Related term: Processing time

Pipeline

A pipeline is a set of data processing instructions written in SQL or expressed as an Apache Flink job. When you create a pipeline, you write a streaming SQL query that specifies what stream(s) of data to process, how to process it, and what stream to send that data to.

See About Pipelines for more information.

Processing time

Processing time refers to the "wall clock" time when Decodable processes the record. The processing time is assigned to every record in a field called time.

Related term: Event time

Record

A single piece of data in a stream. Data flows through your connections, streams, and pipelines as records. Decodable assigns a timestamp value to every record that represents when that record was processed by Decodable.

Resource IDs

A unique identifier for a Decodable resource. All resources in Decodable have a generated resource ID that is unique within your account. Resource IDs allow you to freely rename resources without worrying about breaking your pipelines and other configurations. Resource IDs are short strings of letters and numbers similar to Git SHAs. Similar to Git SHAs, you should treat resource IDs as opaque UTF-8 strings.

Schema

A schema defines the structure of the records flowing through a connector or stream. It specifies the fields that the records contain and the types of values that each field can hold.

There are two types of schemas: a logical schema and a physical schema.

  • Logical schema: The schema managed by the user. When you are asked to define a schema in Decodable, you are defining the logical schema of a record.

  • Physical schema: The schema that refers to the way that the data is stored and processed. The physical schema of a record is normally the same as its logical record. The exception to this rule is a change data capture record. Since change data capture records are stored internally in Debezium-format, the physical schema of a change data capture record includes the type of operation performed and the values of the affected fields before and after the change. For more specifics on the physical schema of change data capture records, see Change Record Schema in the About Change Data Capture and Change Streams topic.

Streams

A stream transports records within Decodable. Pipelines and connections always read or write records to a stream. Streams have a defined schema which specifies the fields that are present in its records. Once a stream has been defined, it can be used as input or output to any number of pipelines and connections.

See Streams for more information.

Tasks

A task determines the amount of resources available for pipelines and connections to process data. Tasks run in parallel, allowing processing to scale out as needed. Decodable allocates up to the number of tasks you specify, although it might allocate fewer if it determines a task would be idle. The amount of processing capacity a task can run varies based on the size of a record, the complexity of the query, and the speed of source- and sink-connected systems. Typically tasks can process 2-8 MiB or 1000-10,000 records per second.

See Manage task count and sizes for more information.

Watermark

A field that can be used to track the progression of event time in a pipeline.

When working with multiple streams of data, it is common for data to arrive out of order. For example, events may arrive with a timestamp indicating that they occurred in the past. Watermarks are used to track the progress of time in a stream, and to signal when all records with a timestamp earlier than the watermark have arrived.

See Manage schemas for information about how to specify a watermark.

Related term: Window

Window

A window is a segment of a data stream with a start time and an end time. When working with streaming data, or data that has no predetermined size or end, windows allow you to summarize a series of records. You can use windowing functions to define windows in pipelines, allowing you to run summarizations on your data stream such as count or sums.

See Windowing Reference for more information.

Related term: Watermark