BYOC setup

These are the cloud resources required by Decodable Bring Your Own Cloud (BYOC) deployments on AWS.

The recommended route is using Terraform.

For customers not using the reference Terraform modules, these resources can be provisioned and the ARNs can be added to the Helm chart values manually.

VPC

Decodable recommends a VPC with subnets in 3 Availability Zones. We recommend using a /20 for the VPC and allocating a /24 for each private and public AZ subnet.

VPCs must have a NAT Gateway or Internet Gateway to allow access to the Internet.

We recommend creating an S3 gateway endpoint in the VPC.

EKS

Decodable software is deployed on Kubernetes using Helm charts. The EKS cluster must:

Decodable expects a dedicated nodegroup with the taint flink-app:NoSchedule, which is used to run Flink TaskManagers. We recommend using m7gd class instances and putting EmptyDir volumes on the instance-local SSD for best performance.

MSK

Decodable uses MSK to store internal topics with pipeline state. These are distinct from “external” topics which are connected with the Decodable Apache Kafka connectors.

Guidelines for the Decodable MSK:

  • Use Kafka 3.5.1 if possible

  • Use Zookeeper for state (don’t enable KRaft)

  • Use brokers of size kafka.m5.large or kafka.m7g.large (or larger)

  • Enable volume auto-scaling for the brokers

  • Ensure there is network connectivity from the EKS cluster to MSK

    • The EKS and MSK clusters should be in the same region to avoid excess latency

    • Configure Security Groups to allow connectivity on port 9098 and 9096

  • Enable both IAM and SASL/SCRAM authentication

S3

Decodable uses S3 to store job state, crash data and job logs. Create 3 distinct buckets for these purposes.

Guidelines for Decodable S3 buckets:

  • Enable retention policies for the crash data and job log buckets to limit bucket size

  • Disable public access for all of these buckets

  • Create these buckets in the same region as the Decodable deployment to reduce costs

Secrets Manager

For Decodable to authenticate to MSK using SASL/SCRAM credentials, it’s necessary to create a KMS key and encrypt a Secret in AWS Secrets Manager. This secret must be associated with the MSK cluster created above, so the username/password can be used for authentication.

The secret should be of the form:

{
  "username": "vector"
  "password": "<16 character random string>"
}

IAM

Policies

These policies reference the resources created above.

decodable_kafka_data

Give Decodable access to create topics, consumer groups and transactions in MSK. This is necessary for Decodable to manage state stored in internal topics.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:DescribeClusterDynamicConfiguration",
                "kafka-cluster:DescribeCluster",
                "kafka-cluster:Connect"
            ],
            "Resource": "arn:aws:kafka:<region>:<account>:cluster/<cluster>"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:WriteData",
                "kafka-cluster:ReadData",
                "kafka-cluster:*Topic*"
            ],
            "Resource": "arn:aws:kafka:<region>:<account>:topic/<cluster>/*"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:DescribeGroup",
                "kafka-cluster:AlterGroup"
            ],
            "Resource": "arn:aws:kafka:<region>:<account>:group/<cluster>/*"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:DescribeTransactionalId",
                "kafka-cluster:AlterTransactionalId"
            ],
            "Resource": "arn:aws:kafka:<region>:<account>:transactional_id/<cluster>/*"
        }
    ]
}

decodable_msk_sasl_scram

Give Decodable access to Secrets Manager to access SASL/SCRAM credentials. These are used to access MSK in components which don’t support IAM authentication.

{
    "Statement": [
        {
            "Action": "secretsmanager:GetSecretValue",
            "Effect": "Allow",
            "Resource": "arn:aws:secretsmanager:<region>:<account>:secret:<sasl_scram_secret_id>",
            "Sid": ""
        },
        {
            "Action": "kms:Decrypt",
            "Effect": "Allow",
            "Resource": "arn:aws:secretsmanager:<region>:<account>:key:<encryption_key>"
            "Sid": ""
        }
    ],
    "Version": "2012-10-17"
}

decodable_s3

Give Decodable access to the S3 buckets where the job state is stored, and where heap dumps and job logs are uploaded.

{
    "Statement": [
        {
            "Action": "s3:ListBucket",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<state bucket>",
                "arn:aws:s3:::<debug bucket>",
                "arn:aws:s3:::<log bucket>"
            ],
            "Sid": ""
        },
        {
            "Action": "s3:*Object*",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<state bucket>/*",
                "arn:aws:s3:::<debug bucket>/*",
                "arn:aws:s3:::<log bucket>/*"
            ],
            "Sid": ""
        }
    ],
    "Version": "2012-10-17"
}

decodable_secrets_manager

Decodable stores internal secrets (such as for authenticating to sources) in AWS Secrets Manager.

{
    "Statement": [
        {
            "Action": "secretsmanager:*",
            "Effect": "Allow",
            "Resource": "arn:aws:secretsmanager:<region>:<account>:secret:decodable/user/account/*",
            "Sid": ""
        },
        {
            "Action": [
                "secretsmanager:List*",
                "secretsmanager:Get*",
                "secretsmanager:Describe*"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:secretsmanager:<region>:<account>:secret:decodable/system/*",
            "Sid": ""
        }
    ],
    "Version": "2012-10-17"
}

Roles

Decodable uses IAM Roles for Service Accounts to authenticate pods to AWS services. This table describes the roles and the required policies:

AWS Role Kubernetes Service Account Policies

data_plane_api

<namespace>:data-plane-api

  • decodable_kafka

  • decodable_s3

  • decodable_secrets_manager

data_plane_controller

<namespace>:data-plane-controller

  • decodable_kafka

  • decodable_s3

  • decodable_secrets_manager

  • decodable_msk_sasl_scram

flink

<namespace>:flink

  • decodable_kafka

  • decodable_s3

flink_cupi

byoj-<account ID>-*:flink

  • decodable_kafka

  • decodable_s3

vector

<namespace>:vector

  • decodable_msk_sasl_scram

  • decodable_s3