Create Pipelines using SQL

If you want Decodable to transform your data before sending it to a sink, you must create a pipeline and apply it to one or more Decodable streams. A pipeline is a set of data processing instructions written in SQL or expressed as an Apache Flink job. See Create Pipelines using your own Apache Flink jobs. When you apply a pipeline to a stream, Decodable uses those instructions to process and transform the data that the pipeline receives.

Steps

Complete the following steps to create a pipeline that processes data from a Decodable stream, transforms it, and sends that data to another Decodable stream.

  1. Do one of the following to open the Pipeline Editor page.

    • On the Pipelines page, select New Pipeline and then select the stream that you’d like the pipeline to process data from.

    • On the Streams page, select the stream that contains the data that you want your pipeline to process. Select …​ next to Output/Outbound, and then select Create New Pipeline.

  2. To transform incoming data before sending it to a sink, modify the existing SQL statement. For more information and examples of the common SQL patterns used to transform data in a pipeline, see the following topics:

    • For the full list of functions that you can include in a Decodable pipeline, see Function Reference.

    • For examples of the most common pipeline patterns, such as how to write SQL that filters or masks data, or how to join two or more streams together to produce one enriched stream, or how to convert an append stream to a change stream or vice-versa, see Pipeline Examples.

    • To learn about how to perform summarizations on streaming data, see Windowing Reference.

  3. (Optional) To generate a preview of how your pipeline processes and transforms data, select the Run Preview button. Use the preview results to validate your SQL statement.

    • A preview session continues until 2 minutes have passed or 30 records are sampled, whichever comes first.

    • You can specify the location in the data stream where the pipeline starts reading data by updating the Starting Position. If set to Latest, then the pipeline samples records starting from the latest position in the data stream. If set to Earliest, then the pipeline samples records starting from the beginning of the data stream. Defaults to Latest.

  4. The sink that your pipeline sends data to is defined in the INSERT INTO statement. If a stream by that name doesn’t exist yet, Decodable will create the output stream for you.

    When specifying columns in an INSERT INTO statement you must do one of the following:

    1. Explicitly list the target columns in the order in which you are going to supply them. This is the recommended approach as it makes it unambiguous.

    2. Alternatively, you must specify the columns based on their position in the target stream.

    In the below examples the source stream has columns src_col1 and src_col2, and the target stream columns colX and colY. The desired outcome is to map data from src_col1 to colY, and src_col2 to colX in the target.

    The first approach is the list the columns explicitly, which is recommended:

    INSERT INTO my_stream (colY, colX)
    SELECT src_col1, src_col2 FROM src;

    Alternatively, you can rely on the position of the columns:

    INSERT INTO my_stream
    SELECT src_col2, src_col1 FROM src;

    Aliasing column names won’t work. The following will result in src_col1 and src_col2 being written to colX and colY respectively, which isn’t the desired outcome.

    INSERT INTO my_stream
    SELECT src_col1 AS colY, src_col2 AS colX FROM src;
  5. To save your pipeline, select Next and give your pipeline a name and description.

You can now start the pipeline to begin processing and transforming data as specified in the pipeline configuration.