Delta Lake sink connector

Features

Connector name

delta-lake

Compatibility (Object store)

Amazon S3

Delivery guarantee

Exactly once

Supported task sizes

S, M, L

Multiplex capability

A single instance of this connector can write to a single Delta Lake table

Supported stream types

The Delta Lake connector supports the data types listed here.

Configuration properties

Property Description Required Default

table-path

Path to of S3 bucket using s3a scheme

Example: s3a://my-bucket/table_name

Yes

s3.role-arn

AWS ARN of the IAM role configured as described below.

Example: arn:aws:iam::111222333444:role/decodable-delta-access.

Yes

Prerequisites

Access to your AWS resources

Decodable interacts with resources in AWS on your behalf. To do this you need an IAM role configured with a trust policy that allows access from Decodable’s AWS account, and a permission policy as detailed below.

For more details on how this works, how to configure the trust policy, and example steps to follow see here.

To use this connector you must associate a permissions policy with the IAM role. This policy must have the following permissions:

  • Read/Write access to the S3 bucket path to which you’re writing data.

    s3:GetObject
    s3:PutObject
    s3:DeleteObject
    s3:PutObjectAcl

    If you want to send data directly at the root level of the bucket, then leave the path blank with the trailing /* included.

  • List access on the bucket to which you’re writing data

    s3:ListBucket
  • Sample Permission Policy
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:PutObjectAcl", "s3:ListBucket"],
          "Resource": ["arn:aws:s3:::your-bucket/some/dir/*"]
        }
      ]
    }

Connector starting state and offsets

A new sink connection will start reading from the Latest point in the source Decodable stream. This means that only data that’s written to the stream when the connection has started will be sent to the external system. You can override this when you start the connection to Earliest if you want to send all the existing data on the source stream to the target system, along with all new data that arrives on the stream.

When you restart a sink connection it will continue to read data from the point it most recently stored in the checkpoint before the connection stopped. You can also opt to discard the connection’s state and restart it afresh from Earliest or Latest as described above.

Learn more about starting state here.