MongoDB source connector

Features

Connector name

mongodb-cdc

Delivery guarantee

Exactly once

Supported task sizes

M, L

Multiplex capability

A single instance of this connector can read from a single collection.

Supported stream types

Change stream

Configuration properties

Property Description Required Default

Basic

hosts

One or more comma-separated host names for your database in Standard Connection String Format. If you have a DNS Seed List URL, you must find the underlying MongoDB instance host names and add them here.

For example: mongodb-1.tld:27017,mongodb-2.tld:27017:mongodb-3.tld:27017

If you are using MongoDB Atlas, the list of host names to include can be found in the Overview page for your database.

Yes

database

The name of the database containing the collection.

Yes

collection

The name of the collection that you want to capture CDC records from.

Yes

username

The username to use to authenticate to MongoDB.

Yes

password

The password associated with the username. This must be provided as a secret resource.

Yes

copy.existing

Whether to copy existing data from the collection. See Migrating existing collections for more information.

true

Advanced

scan.startup.mode

Specifies where in the collection to start reading data when the connection is first started, or when it’s restarted with the state discarded. Must be one of the following:

  • initial: At startup, takes an initial snapshot of monitored database tables, then continuously reads the latest oplog entries thereafter.

  • latest-offset: Avoids taking an initial snapshot of monitored database tables upon startup. Instead, reads changes from the end of the oplog, capturing only the modifications made since the connector was initiated or restarted.

initial

connection.options

Any additional configuration options needed to connect to the MongoDB cluster. See Connection String Options in the MongoDB documentation for a full list of connection string options.

Prerequisites

  • Your MongoDB instance must be publicly accessible. Decodable uses the username and password provided during connection creation to authenticate to the database.

  • You must have a MongoDB user with privileges changeStream and read.

  • Your MongoDB instance must be configured for change stream replication. See Change Streams in the MongoDB documentation for more information.

  • The incoming data must contain a field named _id, and that field must be specified as a primary key in the Decodable stream. To specify a primary key, you must first explicitly tell Decodable that the type isn’t null explicitly by entering: <type> NOT NULL. For example: BIGINT NOT NULL.

Oplog retention

If the connection is stopped or in a failed state for longer than the oplog’s retention period, the connection will fail when it’s restarted. This is because for CDC to work it needs a contiguous series of oplog entries.

If you want to restart the connection in this situation you must discard its current state. By doing this, the initial snapshot of the required tables will be taken again and then the oplog used for subsequent reads.

To do this do, one of the following:

  1. In the Decodable Web UI, select Start and under Starting State select Reset current state and start from the initial state

  2. In the Decodable CLI, do one of the following:

    1. Use connection activate and add the --force flag, for example:

      decodable connection activate cef0e708 --force

      or

    2. Use query with a suitable specifier for the connection (such as --name) and add the --operation reset-state argument, for example:

      decodable query --name customers-source --operation reset-state

Connector starting state and offsets

When you create a connection, or restart it and discard state, it will read from the database based on the configuration of the scan startup mode. By default this is initial and will therefore snapshot the set of monitored tables and read the oplog thereafter.

Learn more about starting state here.

Data types mapping

The following table shows the Decodable data types that are generated from the corresponding MongoDB data types.

Decodable Type MongoDB Type

INT

Int

BIGINT

Long

FLOAT

Long

DOUBLE

Double

DECIMAL

Decimal128

BOOLEAN

Boolean

DATE

Date Timestamp

TIME

Date Timestamp

TIMESTAMP(3)

Date

TIMESTAMP_LTZ(3)

Date

TIMESTAMP(0)

Timestamp

TIMESTAMP_LTZ(0)

Timestamp

STRING

String

STRING

ObjectId

STRING

UUID

STRING

Symbol

STRING

MD5

STRING

JavaScript

STRING

Regex

BYTES

BinData

ROW

Object

ARRAY

Array

ROW<$ref STRING, $id STRING>

DBPointer

(Point) ROW<type STRING, coordinates ARRAY<DOUBLE>></DOUBLE>

GeoJson

(Line) ROW<type STRING, coordinates ARRAY<DOUBLE>></DOUBLE>

GeoJson