AWS S3
Use the Amazon S3 Connector to send data from Decodable to an Amazon S3 bucket. This updated version of the Amazon S3 Connector allows you to configure a file rollover policy and has enhanced security over the original Amazon S3 Connector.
Overview
Connector name | s3-v2 |
Type | source , sink |
Delivery guarantee | exactly once |
Create a connection to Amazon S3 using the Amazon S3 Connector
Prerequisites
Before you can create the Amazon S3 connection, you must have an Identity and Access Management (IAM) role with the following policies. See the Setting up an IAM User section for more information.
- A Trust Policy that allows access from Decodable’s AWS account. The
ExternalId
must match your Decodable account name. - A Permissions Policy with read and write permissions for the destination bucket.
Setting up an IAM User
The following is an example of what the Trust Policy should look like. Replace the <MY_DECODABLE_ACCOUNT_NAME>
with your own. In the example, 671293015970
is Decodable’s AWS account ID and cannot be changed.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::671293015970:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<MY_DECODABLE_ACCOUNT_NAME>"
}
}
}
]
}
Note: To allow several Decodable Accounts (say, in different AWS Regions) to write to the same bucket, use an array of Account names for the
ExternalId
value:
{ "sts:ExternalId": ["my-acct-1", "my-acct-2"] }
See AWS Identity and Access Management • The confused deputy problem for more information about why the ExternalId value is required.
You must also have read and write permissions on the destination S3 bucket. See the following list of permissions and replace <YOUR_BUCKET>
and /some/dir
appropriately. If you want to send data directly at the root level of the bucket, then leave the path blank with the trailing /*
included.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject"],
"Resource": "arn:aws:s3:::<YOUR_BUCKET>/some/dir/*"
},
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::<YOUR_BUCKET>"
}
]
}
Steps
- If you have an existing connection using the older version of the Amazon S3 Connector and would like to switch to the newest version, then do the following steps. If you are not upgrading an existing connection, skip these steps.
- Stop the existing Amazon S3 connection.
- Stop any pipelines that are using it.
- From the Connections page, select the Amazon S3 Connector and complete the following fields.
UI Field | Property Name in the Decodable CLI | Description |
---|---|---|
AWS Region | region | Optional. The AWS region that your S3 bucket is located in. If not specified, defaults to your Decodable account region. For example, us-west-2 . |
Path | path | The file path to the bucket or directory that you want to send data to. For example, s3://bucket/directory . |
IAM Role ARN | role-arn | The AWS ARN of the IAM role. For example, arn:aws:iam::111222333444:role/decodable-s3-access . |
Partition Template | partition-cols | Optional. The field names that you want to use to partition your data. For example, if you want to partition your data based on the datetime field, then enter datetime . See the S3 object key partitioning section for more information. |
Value Format | format | The format for data in the Amazon S3 source. You can select one of the following: - JSON : See JSON Format Properties for information on what additional properties you can specify when using JSON format.- Parquet : See Parquet Format Properties for information on what additional properties you can specify when using Parquet format.- Raw |
Source folder polling frequently | source.monitor-interval | How often to scan the S3 bucket for new files. Defaults to 10 seconds. |
- Select which stream contains the records that you’d like to send to Amazon S3. Then, select Next.
- Give the newly created connection a Name and Description and select Save.
- If you are replacing an existing Amazon S3 connection, then restart any pipelines that were processing data for the previous connection.
Reference
JSON Format Properties
The following properties are only applicable when format=json
.
Property | Required? | Description |
---|---|---|
json.timestamp-format.standard | Optional | Specify the timestamp format for TIMESTAMP and TIMESTAMP_LTZ types. Defaults to ISO-8601
|
json.encode.decimal-as-plain-number | Optional | Must be true or false , defaults to false .When true , always encode numbers without scientific notation.For example, a number encoded 2.7E-8 by default would be encoded 0.000000027 . |
Parquet Format Properties
The following properties are only applicable when format=parquet
.
Property | Description |
---|---|
parquet.compression | Options are SNAPPY , GZIP and LZO . Defaults to no compression. |
Other parquet options are available. See ParquetOutputFormat for more information.
Updated 2 months ago