ClickHouse
ClickHouse® is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP). It is a popular database with nearly 25,000 GitHub stars and stylizing itself as “a free analytics DBMS for big data,” ClickHouse has achieved widespread adoption and helped engineers everywhere perform real type analytics at scale. Benchmarked at 100x faster than Hive or MySQL, ClickHouse is adopted by many engineering teams to serve queries at very low latencies across large datasets. To achieve this performance at scale compared to standard data warehouses like Snowflake, it makes some architectural tradeoffs that users should keep in mind. For this reason, users should consider conforming the data to ClickHouse best practices before ingestion. Decodable makes it easy to transform and prepare data, allowing you to get the most out of it when the data lands in ClickHouse.
Getting Started
Connections come in two flavors: source and sink. Source connections read from an external system and write to a Decodable stream, while sink connections read from a stream and write to an external system. ClickHouse connectors can only be used in the sink
role.
Configure As A Sink
To create and configure a connector for ClickHouse, sign in to the Decodable Web Console, navigate to the Connections tab, click on New Connection
, and follow the steps below. For examples of using the command line tools or scripting, see the How To guides.
-
The connector type will default to
sink
, since that is the only option for ClickHouse connectors. -
Specify the hostname and port to connect to your ClickHouse database. Note that these access parameters should specify the web address for ClickHouse, not the native TCP address.
-
Provide the Cluster, Database, and Table name that the sink should write to. Decodable does not create any of these entities for you, so they must already exist in your ClickHouse instance in order for Decodable to successfully write data.
-
Provide the username and password of the user on whose behalf the connection is being made.
Note that while Decodable may write APPEND-only data to both distributed and non-distributed tables, CHANGE stream data may only be written to non-distributed tables. As such it is generally recommended that you not use the distributed table engine for ClickHouse sink tables.
It is also strongly recommended that you avoid writing to buffer tables, as their flush strategy nullifies Decodable's delivery guarantees.
Reference
Connector name | clickhouse |
Type | sink |
Delivery guarantee | at least once |
The ClickHouse sink connects to a ClickHouse table through HTTP requests. ClickHouse sinks can materialize data from streams of both APPEND and CHANGE types; however, be advised that clickhouse sinks handling change streams (particularly those with frequent update and delete records) may operate significantly slower than pure append streams.
When a ClickHouse sink is connected to a CHANGE stream, a primary key must be specified in the schema
Data Type Mappings
Below is a mapping from data types found in Decodable streams, to their corresponding type in ClickHouse. See the documentation for more info on ClickHouse's types, and Decodable's supported types.
Decodable Type | ClickHouse Type |
---|---|
CHAR | String |
VARCHAR | String / IP / UUID |
STRING | String / Enum |
BOOLEAN | UInt8 |
BYTES | FixedString |
DECIMAL | Decimal / Int128 / Int256 / UInt64 / UInt128 / UInt256 |
TINYINT | Int8 |
SMALLINT | Int16 / UInt8 |
INT | Int32 / UInt16 / Interval |
BIGINT | Int64 / UInt32 |
FLOAT | Float32 |
DOUBLE | Float64 |
DATE | Date |
TIME | DateTime |
TIMESTAMP | DateTime |
TIMESTAMP_LTZ | DateTime |
INTERVAL | Int32 / Int64 |
ARRAY | Array |
MAP | Map |
ROW | Not supported |
MULTISET | Not supported |
Updated 3 months ago