MongoDB source connector Use the MongoDB CDC(Change Data Capture) Connector to get data from MongoDB into Decodable. The MongoDB CDC Connector is powered by Debezium and able to extract and send change events (INSERTS, UPDATES, and DELETES) through Decodable. If you are looking for instructions on how to send data from Decodable into MongoDB, see MongoDB sink connector in the Connect to a data destination chapter. Features Delivery guarantee Exactly once Prerequisites Before you can get data from MongoDB, the following requirements must be met: Your MongoDB instance must be publicly accessible. Decodable uses the username and password provided during connection creation to authenticate to the database. You must have a MongoDB user with privileges changeStream and read. Your MongoDB instance must be configured for change stream replication. See Change Streams in the MongoDB documentation for more information. Steps If you want to use the Decodable CLI or API to create the connection, you can refer to the Property Name column for information about what the underlying property names are. The connector name is mongodb-cdc. From the Connections page, select the MongoDB Connector and complete the following fields. UI Field Property Name Description Connection Type N/A Select Source to use this connector to get data into Decodable. Hosts hosts One or more host names for your database in Standard Connection String Format. If you have a DNS Seed List URL, you must find the underlying MongoDB instance host names and add them here. For example: mongodb-1.tld:27017,mongodb-2.tld:27017:mongodb-3.tld:27017 If you are using MongoDB Atlas, the list of host names to include can be found in the Overview page for your database. Database database The name of the database containing the collection. Collection collection The name of the collection that you want to capture CDC records from. Username username The username to use to authenticate to MongoDB. Password password The password associated with the username. This must be provided as a secret resource. If you are using the Decodable CLI, run decodable secret list to view available secrets or decodable secret --help for help with creating a new secret. Note: For security purposes, Decodable will never display secret values in plaintext. You can manage which users have permissions to create, delete, or modify secrets in the Access Control management view. See Roles, groups, and permissions for more information. Scan Startup Mode scan.startup.mode Optional. Specifies where in the collection to start reading data when the connection is first started, or when it’s restarted with the state discarded. Must be one of the following: initial (default): At startup, takes an initial snapshot of monitored database tables, then continuously reads the latest oplog entries thereafter. latest-offset: Avoids taking an initial snapshot of monitored database tables upon startup. Instead, reads changes from the end of the oplog, capturing only the modifications made since the connector was initiated or restarted. Copy Existing Data copy.existing Optional. Set to true to copy existing data from the collection. See Migrating existing collections for more information. Defaults to true Additional Configuration Options connection.options Optional. Any additional configuration options needed to connect to the MongoDB cluster. See Connection String Options in the MongoDB documentation for a full list of connection string options. Select the stream that you’d like to connect to this connector or create a new stream. Then, select Next. If you decided to create a new stream in the previous step, you must define its schema. Select New Schema to manually enter the fields and field types present or Import Schema if you want to paste the schema in Avro or JSON format. The stream’s schema must match the schema of the data that you plan on sending through this connection. For information on how MongoDB types map to Decodable types, see Data type mappings. The incoming data must contain a field named _id, and that field must be specified as a primary key. To specify a primary key, you must first explicitly tell Decodable that the type isn’t null explicitly by entering: <type> NOT NULL. For example: BIGINT NOT NULL. For more information about Change Data Capture, change streams, or creating a stream, see the following pages: About Change Data Capture and Change Streams. Create and manage Streams. Select Next when you are finished providing defining the connection’s schema. Give the newly created connection a Name and Description and select Save. Oplog retention If the connection is stopped or in a failed state for longer than the oplog’s retention period, the connection will fail when it’s restarted. This is because for CDC to work it needs a contiguous series of oplog entries. If you want to restart the connection in this situation you must discard its current state. By doing this, the initial snapshot of the required tables will be taken again and then the oplog used for subsequent reads. To do this do, one of the following: In the Decodable Web UI, select Start and under Starting State select Reset current state and start from the initial state In the Decodable CLI, do one of the following: Use connection activate and add the --force flag, for example: decodable connection activate cef0e708 --force or Use query with a suitable specifier for the connection (such as --name) and add the --operation reset-state argument, for example: decodable query --name customers-source --operation reset-state Connector starting state and offsets When you create a connection, or restart it and discard state, it will read from the database based on the configuration of the scan startup mode. By default this is initial and will therefore snapshot the set of monitored tables and read the oplog thereafter. Learn more about starting state here. Data type mappings The following table describes the mapping of Decodable data types to their MongoDB data type counterparts. Decodable Type MongoDB Type INT Int BIGINT Long FLOAT Long DOUBLE Double DECIMAL Decimal128 BOOLEAN Boolean DATE Date Timestamp TIME Date Timestamp TIMESTAMP(3) Date TIMESTAMP_LTZ(3) Date TIMESTAMP(0) Timestamp TIMESTAMP_LTZ(0) Timestamp STRING String STRING ObjectId STRING UUID STRING Symbol STRING MD5 STRING JavaScript STRING Regex BYTES BinData ROW Object ARRAY Array ROW<$ref STRING, $id STRING> DBPointer (Point) ROW<type STRING, coordinates ARRAY<DOUBLE>></DOUBLE> GeoJson (Line) ROW<type STRING, coordinates ARRAY<DOUBLE>></DOUBLE> GeoJson