MongoDB CDC

MongoDB® is a NoSQL document database used for high-volume data storage. As a document store, MongoDB makes use of collections and documents rather than tables and rows. It offers a flexible data model for storing JSON-like data, and provides indexing, replication.

Getting Started

Connections come in two flavors: source and sink. Source connections read from an external system and write to a Decodable stream, while sink connections read from a stream and write to an external system. MongoDB CDC connectors can only be used in the source role.

Your MongoDB instance must be configured for change stream replication. Please consult the MongoDB documentation for more information.

Configure As A Source

To create and configure a connector for MongoDB CDC, sign in to the Decodable Web Console, navigate to the Connections tab, click on New Connection, and follow the steps below. For examples of using the command line tools or scripting, see the How To guides.

  1. The connector type will default to source, since that is the only option for MongoDB CDC connectors.

  2. Specify the hosts to your MongoDB instances. Note that at present only the Standard Connection String Format hosts format is supported. If you have a DNS Seed List URL, you must find the underlying MongoDB instance hostnames to add to the hosts lists.

  • example: mongodb-1.tld:27017,mongodb-2.tld:27017:mongodb-3.tld:27017
  1. Provide the Database and Collection you want to capture CDC events from.

  2. Provide the username and password of the user on whose behalf the connection is being made. Note that the MongoDB privileges changeStream and read are required for this user.

  3. Configure if you would like the existing collection records copied as well. This defaults to 'true'.
    See also: Migrating existing collections)

  4. Specify any additional MongoDB Connection String Options needed to connect to the MongoDB cluster. By default TLS/SSL communication is not enabled, but can be by including ssl=true in this field.

Your MongoDB instance may require configuring network access from Decodable IP space. Please contact [email protected] or join our Slack community and we can provide these values to you.

Schema

The Mongo CDC connector requires a schema field named _id declared as NOT NULL PRIMARY KEY, with a compatible data type to the underlying data.

Other schema fields can be declared to match the underlying MongoDB collection schema. For example, a collection with records looking like:

{
   "_id": 1004,
   "first_name": "Anne Marie",
   "last_name": "Kretchmar",
   "email": "[email protected]"
}

Would require a schema definition like:

Field NameField Type
_idBIGINT NOT NULL PRIMARY KEY
first_nameSTRING
last_nameSTRING
emailSTRING

See also: Data Type Mappings

Reference

Connector namemongodb-cdc
Typesource
Delivery guaranteeexactly once

Properties

The following properties are supported by the MongoDB CDC connector.

PropertyRequiredDescription
hostsrequiredHosts to connect to (comma delimited with ports)
databaserequiredDatabase containing the collection
collectionrequiredThe name of the collection to use
usernamerequiredUsername to use for authentication
passwordrequiredPassword to use for authentication
connection.optionsoptionalAdditional connection options (e.g. ssl=true)
copy.existingoptionalCopy existing data from the collection (default: true)

Data Type Mappings

Below is a mapping from data types found in Decodable streams, to their corresponding type in MongoDB. See the documentation for more info on MongoDB's types, and Decodable's supported types.

Decodable TypeMongoDB Type.
INTInt
BIGINTLong
FLOATLong
DOUBLEDouble
DECIMALDecimal128
BOOLEANBoolean
DATEDate Timestamp
TIMEDate Timestamp
TIMESTAMP(3)Date
TIMESTAMP_LTZ(3)Date
TIMESTAMP(0)Timestamp
TIMESTAMP_LTZ(0)Timestamp
STRINGString
STRINGObjectId
STRINGUUID
STRINGSymbol
STRINGMD5
STRINGJavaScript
STRINGRegex
BYTESBinData
ROWObject
ARRAYArray
ROW<$ref STRING, $id STRING>DBPointer
(Point) ROW<type STRING, coordinates ARRAY>GeoJson
(Line) ROW<type STRING, coordinates ARRAY>GeoJson