Key concepts and terminology

Accounts

An account is an isolated tenant in Decodable. It exists within a single cloud provider and region, but an account can connect to data services anywhere. All Decodable resources are scoped within an account.

Accounts have one or more identity providers (IDPs) which determine from where logins are allowed.

See Accounts for more information.

Append stream

An append stream is a type of data stream in which new records are continuously added to the end of the stream. In other words, an append stream is a stream that is only ever appended to and each individual record in a change stream is an independent event. The default retention time for append streams is 1 week.

Related terms

Change stream

A change stream is a type of data stream that captures changes to a database in real-time. A record in a change stream is called a Change Data Capture (CDC) record. A CDC record represents a change made to a database, such as updates, inserts, and deletes. You must specify a field to use as a primary key for all change streams. Change streams have an indefinite retention policy.

See About Change Data Capture and Change Streams for more information.

Related terms

Connections

A connection is a configuration of a connector. Connections contain the identification details and credentials that Decodable uses to access external data sources or data destinations. A source connection receives data from an upstream system and sends it to a stream, while a sink connection reads from a stream and sends it to a downstream system.

See Connections for more information.

Connector

A connector is the part of the Decodable platform that connects Decodable with a data source or destination. They provide integration with your applications, streaming systems, storage systems, and databases.

See Connector Reference on the data sources or destinations that Decodable can connect to.

Delivery guarantees

A "delivery guarantee" refers to the assurance that data processed by Decodable will be delivered to its intended destination(s) in a reliable and consistent manner, without loss or duplication. The delivery guarantee varies depending on the connector that you are using to receive or send data. There are two types of delivery guarantees.

  • At-least-once delivery: With this delivery guarantee, Decodable ensures that each event in a data stream is processed at least once, even if failures occur during processing. This guarantee ensures that all data is eventually delivered, but it may result in duplicate events being delivered in certain situations.
  • Exactly-once delivery: With this delivery guarantee, Decodable ensures that each event in a data stream is processed exactly once, without duplicates or losses. Decodable pipelines have an exactly-once delivery guarantee.

The exact delivery guarantee of each connector can be found in the Connector Reference.

Event time

Event time refers to the time at which an event actually occurred in the real world or when the event originated from an upstream system. The event time is present in the record itself and is of data type timestamp or similar.

Related term: Processing time

Pipeline

A pipeline is a set of data processing instructions written in SQL. When you create a pipeline, you write a streaming SQL query that specifies what stream(s) of data to process, how to process it, and what stream to send that data to.

See Pipelines for more information.

Processing time

Processing time refers to the "wall clock" time when Decodable processes a record.

Related term: Event time

Record

A single piece of data in a stream. Every record contains one or more fields, and each field is associated with a specific data type (e.g. string, integer, map). Data flows through your connections, streams, and pipelines as records. Records can also be transformed via pipelines.

Resource

A user-managed component of the Decodable platform, such as a pipeline, stream, or connection. Every resource belongs to exactly one account.

Resource IDs

A unique identifier for a Decodable resource. All resources in Decodable have a generated resource ID that is unique within your account. Resource IDs are short strings of letters and numbers similar to Git SHAs. Similar to Git SHAs, you should treat resource IDs as opaque UTF-8 strings.

Schema

A schema defines the structure of the records flowing through a connector or stream. It specifies the fields that the records contain and the types of values that each field can hold.

There are two types of schemas: a logical schema and a physical schema.

  • Logical schema: The schema managed by the user. When you are asked to define a schema in Decodable, you are defining the logical schema of a record.
  • Physical schema: The schema that refers to the way that the data is stored and processed. The physical schema of a record is normally the same as its logical record. The exception to this rule is a change record, also known as a change data capture record, which are records in a change stream. Since change records are stored internally in Debezium-format, the physical schema of a change data capture record includes the type of operation performed and the values of the affected fields before and after the change. For more specifics on the physical schema of change data capture records, see Change Record Schema in the About Change Data Capture and Change Streams topic.

Streams

A stream transports records within Decodable. You can add pipelines between streams that read, transform, and write records to another stream. Or, you can have streams write directly to another stream. Source connections read data from an external system and write records to a stream, while sink connections read records from a stream and write to an external system. Streams have a defined schema which specifies the fields that are present in its records. Once a stream has been defined, it can be used as input or output to any number of pipelines and connections.

Decodable accounts come with special read-only streams called _metrics and _events which contain metrics and an auditable history of your account’s resources.

See Streams for more information.

Tasks

Task count determines the amount of resources available for pipelines and connections to process data. Tasks run in parallel, allowing processing to scale out as needed. Decodable allocates up to the number of tasks you specify, although it may allocate fewer if it determines a task would be idle. The amount of processing capacity a task can perform varies based on the size of a record, the complexity of the query, and the speed of source- and sink-connected systems. Typically tasks can process 2-8MiB or 1000-10,000 records per second.

See Manage task count and sizes for more information.

Watermark

A watermark is an internal data structure which Decodable uses to track the progression of event time in a pipeline. They are used to provide a cutoff for when we can reasonably assume that records with a timestamp earlier than the watermark have arrived. Users can create watermarks on streams by specifying the field that contains the event time and the amount of time to wait for out of order (i.e. late arriving) data. All pipelines reading from the stream will use this watermark.

When working with multiple streams of data, it is common for data to arrive out of order. For example, an event’s timestamp might indicate that it happened a day earlier than the previous event. Watermarks enable Decodable to perform computations like window aggregations accurately. When Decodable receives a watermark with a certain timestamp, it closes the window and computes and emits the result of the window. For example, a watermark with timestamp t can be understood as an assertion that all records with timestamps < t have arrived and the window can be closed.

See Managing streams for information about how to specify a watermark.

Related terms:

Window

A window is a segment of a data stream with a start time and an end time. When working with streaming data, or data that has no predetermined size or end, windows allow you to summarize a series of records. Windows can be defined using either event time or processing time. You can use windowing functions to define windows in pipelines, allowing you to perform summarizations on your data stream such as count or sums.

See Windowing Reference for more information

Related terms: