MongoDB CDC
MongoDB® is a NoSQL document database used for high-volume data storage. As a document store, MongoDB makes use of collections and documents rather than tables and rows. It offers a flexible data model for storing JSON-like data, and provides indexing, replication.
Getting Started
Connections come in two flavors: source and sink. Source connections read from an external system and write to a Decodable stream, while sink connections read from a stream and write to an external system. MongoDB CDC connectors can only be used in the source
role.
Your MongoDB instance must be configured for change stream replication. Please consult the MongoDB documentation for more information.
Configure As A Source
To create and configure a connector for MongoDB CDC, sign in to the Decodable Web Console, navigate to the Connections tab, click on New Connection
, and follow the steps below. For examples of using the command line tools or scripting, see the How To guides.
-
The connector type will default to
source
, since that is the only option for MongoDB CDC connectors. -
Specify the hosts to your MongoDB instances. Note that at present only the Standard Connection String Format hosts format is supported. If you have a DNS Seed List URL, you must find the underlying MongoDB instance hostnames to add to the hosts lists.
- example:
mongodb-1.tld:27017,mongodb-2.tld:27017:mongodb-3.tld:27017
-
Provide the Database and Collection you want to capture CDC events from.
-
Provide the username and password of the user on whose behalf the connection is being made. Note that the MongoDB privileges
changeStream
andread
are required for this user. -
Configure if you would like the existing collection records copied as well. This defaults to 'true'.
See also: Migrating existing collections) -
Specify any additional MongoDB Connection String Options needed to connect to the MongoDB cluster. By default TLS/SSL communication is not enabled, but can be by including
ssl=true
in this field.
Your MongoDB instance may require configuring network access from Decodable IP space. Please contact [email protected] or join our Slack community and we can provide these values to you.
Schema
The Mongo CDC connector requires a schema field named _id
declared as NOT NULL PRIMARY KEY
, with a compatible data type to the underlying data.
Other schema fields can be declared to match the underlying MongoDB collection schema. For example, a collection with records looking like:
{
"_id": 1004,
"first_name": "Anne Marie",
"last_name": "Kretchmar",
"email": "[email protected]"
}
Would require a schema definition like:
Field Name | Field Type |
---|---|
_id | BIGINT NOT NULL PRIMARY KEY |
first_name | STRING |
last_name | STRING |
STRING |
See also: Data Type Mappings
Reference
Connector name | mongodb-cdc |
Type | source |
Delivery guarantee | exactly once |
Properties
The following properties are supported by the MongoDB CDC connector.
Property | Required | Description |
---|---|---|
hosts | required | Hosts to connect to (comma delimited with ports) |
database | required | Database containing the collection |
collection | required | The name of the collection to use |
username | required | Username to use for authentication |
password | required | Password to use for authentication |
connection.options | optional | Additional connection options (e.g. ssl=true ) |
copy.existing | optional | Copy existing data from the collection (default: true ) |
Data Type Mappings
Below is a mapping from data types found in Decodable streams, to their corresponding type in MongoDB. See the documentation for more info on MongoDB's types, and Decodable's supported types.
Decodable Type | MongoDB Type. |
---|---|
INT | Int |
BIGINT | Long |
FLOAT | Long |
DOUBLE | Double |
DECIMAL | Decimal128 |
BOOLEAN | Boolean |
DATE | Date Timestamp |
TIME | Date Timestamp |
TIMESTAMP(3) | Date |
TIMESTAMP_LTZ(3) | Date |
TIMESTAMP(0) | Timestamp |
TIMESTAMP_LTZ(0) | Timestamp |
STRING | String |
STRING | ObjectId |
STRING | UUID |
STRING | Symbol |
STRING | MD5 |
STRING | JavaScript |
STRING | Regex |
BYTES | BinData |
ROW | Object |
ARRAY | Array |
ROW<$ref STRING, $id STRING> | DBPointer |
(Point) ROW<type STRING, coordinates ARRAY> | GeoJson |
(Line) ROW<type STRING, coordinates ARRAY> | GeoJson |
Updated 7 months ago