Joining streaming data

One of the main use cases of Decodable is data enrichment, where data from multiple sources is combined to create a more comprehensive and meaningful view of the data.

Joining is a fundamental operation in data enrichment where data from multiple sources is combined based on a common key. With Decodable, you can perform different types of joins, we call them regular joins, windowed joins, and temporal joins.

The process of joining involves combining two or more streams of data based on a specific condition or set of conditions. For example, you can join two streams based on a common field such as a customer ID, product ID, or timestamp. You can also join two different types of streams, for example, you can join change streams to change streams, change streams to append streams, or append streams to append streams.

Data enrichment through joins can help you to gain insights and make informed decisions based on a more comprehensive view of your data. By combining data from multiple sources, you can identify patterns, trends, and relationships that would not be apparent from analyzing each source of data separately.

Benefits of real-time joins vs traditional joins

Traditionally, most data processing systems relied on batch processing. In these systems, data is processed on a set schedule so you might have a batch system that runs nightly that processes all the data accumulated during the day. This can result in stale data as there can be a discrepancy between the actual state of the world and what is being reported in the system, preventing you from taking timely action on insights.

With Decodable, you can take advantage of real-time joins and perform joins on records as they are arriving into the system. Real-time joins in Decodable provide immediate results without compromising accuracy. We guarantee exactly-once delivery semantics; there is no need for special cases to manage data integrity between traditional or batch join boundaries.

Types of joins

Decodable supports several types of streaming joins. Use this table for a quick overview on the types of joins that you can do in Decodable.

Join type Description Best when…​ Inputs/Outputs

Regular joins

Join operation that matches records based on a common field or set of fields.

These joins provide the most flexibility, but this flexibility comes at a cost. Because the state of each record must be kept indefinitely so that Decodable knows when a record has been updated, regular joins can be inefficient and consume a lot of resources.

You want to join two streams that have relatively low throughput of data.

For example, you want to output every matching pair of records from the two different sources.

Inputs: A change stream or append stream

Outputs: Change stream (except for an inner join with only append streams)

Windowed joins

Join operation that matches records based on a sliding or tumbling window of time.

This join differs from the regular join because the state of each record is only kept for the time specified in the time window.

Documentation coming soon. If you need help with windowed joins, contact us.

You want to join two high throughput streams and summarize the merged stream in a given time window.

For example, you want to join two streams based on a sliding window of 30 seconds.

Inputs: Append streams

Outputs: Append stream

Time-based lookup joins

Join operation that matches records from a data stream with records from a constantly changing table based on a time interval or timestamp range.

You want to enrich your data with data from a table that constantly changes over time.

For example, you want to join a stream with a temporal table where a temporal table is an ever-changing table that stores historical data like exchange rates or stock prices.

Inputs: One change stream and one append stream

Outputs: Append stream