Amazon S3 source connector Features Connector name s3-v2 Delivery guarantee Exactly once Supported task sizes S, M, L (without SQS) M,L (with SQS) Multiplex capability A single instance of this connector can read from all files in a single S3 bucket Supported stream types Append Stream For better performance, enable an Amazon SQS (Simple Queue Service) queue so that the connector can receive notifications about when new files are written to the S3 bucket. There is no defined order of ingestion for the files read from S3. Configuration properties Property Description Required Default region The AWS region that your S3 bucket is located in. For example, us-west-2. — If not specified, defaults to your Decodable account region. path The file path to the bucket or directory that you want to read data from. For example, s3://bucket/directory. Yes role-arn The AWS ARN of the IAM role that has permissions to access the S3 bucket. For example, arn:aws:iam::111222333444:role/decodable-s3-access. Yes format The format of data to read from S3. Must be one of the following: json parquet avro Yes partition-cols The field names that partition the data. For example, if you want to partition your data based on the datetime field, then enter datetime. — JSON-specific configuration json.timestamp-format.standard Specify the timestamp format for TIMESTAMP and TIMESTAMP_LTZ types. SQL will use a yyyy-MM-dd HH:mm:ss.SSS format, e.g "2020-12-30 12:13:14.123" ISO-8601 will parse input TIMESTAMP in yyyy-MM-ddTHH:mm:ss.SSS format, e.g "2020-12-30T12:13:14.123" — SQL Reading data from S3 source.monitor-interval (without SQS) How often, in seconds, to scan the S3 bucket for new files. (with SQS) How often, in seconds, events are polled from the queue. — 10 source.sqs-url The SQS queue URL. Only applicable if you are connecting to an SQS-enabled S3 bucket. source.scan-on-startup Whether to scan the specified bucket for existing files, before relying on SQS for future file notifications. true: the connection scans the S3 bucket and ingests any historical data it contains, as well as all new data that arrives. false: only new data received after the SQS queue begins to receive file notifications is ingested. true Prerequisites Access to your AWS resources Decodable interacts with resources in AWS on your behalf. To do this you need an IAM role configured with a trust policy that allows access from Decodable’s AWS account, and a permission policy as detailed below. For more details on how this works, how to configure the trust policy, and example steps to follow see here. To use this connector you must associate a permissions policy with the IAM role. This policy must have the following permissions: Read access on the S3 bucket path from which you’re reading data. s3:GetObject If you want to read data directly at the root level of the bucket, then leave the path blank with the trailing /* included. List access on the bucket from which you’re reading data s3:ListBucket Sample Permission Policy { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:ListBucket"], "Resource": "arn:aws:s3:::my_bucket" }, { "Effect": "Allow", "Action": ["s3:GetObject"], "Resource": "arn:aws:s3:::my_bucket/some/dir/*" } ] } Detecting bucket changes with SQS By default, the Amazon S3 connector monitors files in the specified bucket by scanning the entire bucket at an interval set by the user. However, for S3 buckets with a large number of files, this approach can be costly and inefficient. An alternative solution is to configure the Amazon S3 connector to detect and pick up new files using Amazon SQS. This queue must be configured to receive event notifications for new files from the S3 bucket. When this option is enabled, the connector does an initial full bucket scan of the S3 bucket after it’s started for the first time or every time state is discarded on startup. Following this initial scan, the connector exclusively discovers new files to read from the file notifications arriving in the SQS queue. This option is recommended for long-lived and high-scale workloads. During connection creation, you can disable the initial full bucket scan by setting the source.scan-on-startup property to false. The following prerequisites apply if you want to get data from an SQS-enabled S3 bucket: You must have a standard queue type with the following access policy. { "Version": "2012-10-17", "Id": "example-ID", "Statement": [ { "Sid": "example-statement-ID", "Effect": "Allow", "Principal": { "Service": "s3.amazonaws.com" }, "Action": "SQS:SendMessage", "Resource": "arn:aws:sqs:<YOUR_SQS_QUEUE_ARN_HERE>", "Condition": { "ArnLike": { "aws:SourceArn": "arn:aws:s3:*:*:*" } } } ] } Update your IAM role to include a permissions policy for SQS. { "Version": "2012-10-17", "Statement": [ { "Sid": "Statement1", "Effect": "Allow", "Action": [ "sqs:ReceiveMessage", "sqs:DeleteMessage" ], "Resource": [ "arn:aws:sqs:<YOUR_SQS_QUEUE_ARN_HERE>" ] } ] } Configure your S3 bucket to send event notification messages to the SQS queue you configured in Step 1 whenever new files arrive that match your path prefix. When configuring event notifications, make sure that you select the check box for All object create events under Event Types. For the best performance, don’t select any other event types. See Enabling and configuring event notifications using the Amazon S3 console in the AWS documentation for information on how to enable notification messages. Once activated, this connection consumes and deletes messages from your queue during file processing. It’s not recommended for other applications to share this same SQS queue. Connector starting state and offsets When you create a connection it will by default read the entire contents of the S3 bucket. If you are using SQS you can customize this behavior with the configuration parameter source.scan-on-startup. Learn more about starting state here.