Delta Lake sink connector

Delta Lake is an open source project that enables building a Lakehouse architecture on top of data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes, such as Amazon S3, Azure Data Lake Storage (ADLS), Google Cloud Storage (GCS), and HDFS. Specifically, Delta Lake offers:

  • ACID transactions on Spark

  • Scalable metadata handling

  • Streaming and batch unification

  • Schema enforcement

  • Upserts and deletes

Features

Delivery guarantee

Exactly once

The Delta Lake connector streams data in Delta Lake format to an S3 bucket in your AWS account.

For more detailed information about configuring Delta Lake, see the Delta Lake Quickstart guide and related documentation.

Prerequisites

Access to your AWS resources

Decodable interacts with resources in AWS on your behalf. To do this you need an IAM role configured with a trust policy that allows access from Decodable’s AWS account, and a permission policy as detailed below.

For more details on how this works, how to configure the trust policy, and example steps to follow see here.

To use this connector you must associate a permissions policy with the IAM role. This policy must have the following permissions:

  • Read/Write access to the S3 bucket path to which you’re writing data.

    s3:GetObject
    s3:PutObject
    s3:DeleteObject
    s3:PutObjectAcl

    If you want to send data directly at the root level of the bucket, then leave the path blank with the trailing /* included.

  • List access on the bucket to which you’re writing data

    s3:ListBucket
  • Sample Permission Policy
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:PutObjectAcl", "s3:ListBucket"],
          "Resource": ["arn:aws:s3:::your-bucket/some/dir/*"]
        }
      ]
    }

Connector properties

If you want to use the Decodable CLI or API to create the connection, you can refer to the Property column for information about what the underlying property names are. The connector name is delta-lake.
Property Disposition Description

table-path

required

Path to of S3 bucket using s3a scheme
Example: s3a://my-bucket/table_name

s3.role-arn

required

AWS ARN of the IAM Role configured as described below.
Example: arn:aws:iam::111222333444:role/decodable-delta-access.

Supported data types

The Delta Lake connector supports the data types listed here.