Delta Lake

Delta Lake is an open source project that enables building a Lakehouse architecture on top of data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes, such as S3, ADLS, GCS, and HDFS. Specifically, Delta Lake offers:

  • ACID transactions on Spark

  • Scalable metadata handling

  • Streaming and batch unification

  • Schema enforcement

  • Upserts and deletes

Getting started

There are two types of Decodable connections: source connections and sink connections. Source connections read from an external system and write to a Decodable streams, while sink connections read from a stream and write to an external system. Delta Lake connectors can only be used in the sink role.

Configure as a sink

To create and configure a connector for Delta Lake, sign in to Decodable Web, navigate to the Connections tab, click on New Connection, and follow the steps below. For examples of using the command line tools or scripting, see the How To guides.

  1. The connector type will default to sink, since that is the only option for Delta Lake connectors.

  2. Specify the path to your delta table in S3. For example, s3a://my-bucket/table_path.

  3. Specify the AWS ARN of the IAM role. For example, arn:aws:iam::111222333444:role/decodable-delta-access.

For more detailed information about configuring Delta Lake, see the Delta Lake Quickstart guide and related documentation.

Reference

Connector name

delta-lake

Type

sink

Delivery guarantee

exactly once

The Delta Lake connector streams data in Delta Lake format to an S3 bucket in your AWS account. To use it, configure an AWS IAM Role as described below, with specific permissions to write to the bucket.

Properties

The following properties are supported by the Delta Lake connector.

Property Disposition Description

table-path

required

Path to of S3 bucket using s3a scheme
Example: s3a://my-bucket/table_name

s3.role-arn

required

AWS ARN of the IAM Role configured as described below.
Example: arn:aws:iam::111222333444:role/decodable-delta-access.

IAM role, permissions, and security

To be secure, you, AWS, and Decodable work together to ensure only Delta Lake connections in your Decodable Account can put data to your S3 bucket.

How?

AWS IAM provides a special mechanism — called ExternalId — that you and Decodable will use as described here, which ensures access from Decodable to your bucket happens only for your Decodable Account. Like this:

  • You’ll create and configure an IAM Role with two Policies:

    • A Trust Policy allowing access from Decodable’s AWS account — but only with an ExternalId matching your (unique) Decodable account name.

    • A Permissions Policy with the needed permissions on your bucket.

  • You’ll provide us the ARN of this Role via your Decodable Delta Lake connection’s s3.role-arn property.

  • Our servers will assume that Role using an ExternalId value matching only your Decodable Account name — never any other. We’ll use that to talk to your bucket.

Note that the values here are not treated as secret (by us, AWS, or you): not ExternalId (your account name), not the Role ARN, not the bucket name.

Specifically, your IAM Role (per-roleArn) must:

  • have an AssumeRole Trust Policy that:

    • names Decodable’s AWS account ID (671293015970) as Principal.

    • has a Condition requiring sts:ExternalId to equal your Decodable Account name.

  • have a Permissions Policy allowing needed operations on the bucket (not Role) ARN and S3 key (path).
    The Policy Actions are:

    • s3:GetObject

    • s3:PutObject

    • s3:DeleteObject

    • s3:ListBucket

    • s3:PutObjectAcl

Example trust policy

Here’s an example IAM Trust Policy. Replace my-decodable-account. Note that 671293015970 is Decodable’s AWS account ID and must match exactly.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::671293015970:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "my-decodable-account"
        }
      }
    }
  ]
}
To allow several Decodable Accounts (say, in different AWS Regions) to write to the same bucket, use an array of Account names for the ExternalId value:
{ "sts:ExternalId": ["my-acct-1", "my-acct-2"] }

Here’s an example IAM Permissions Policy. Replace your-bucket (twice) and /some/dir appropriately. Note that the path (here: /some/dir) can be blank to put S3 objects to bucket root path, but the trailing /* is required.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:PutObjectAcl", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::your-bucket/some/dir/*"]
    }
  ]
}

Further reading — from AWS

For full discussion from AWS of the security problem this solves, and its AWS-recommended solution using ExternalId, we recommend reading: AWS Identity and Access Management • The confused deputy problem.

Supported types

Only the following SQL data types are supported.