ClickPipes for Postgres FAQ
How does idling affect my Postgres CDC ClickPipe?
If your ClickHouse Cloud service is idling, your Postgres CDC ClickPipe will continue to sync data, your service will wake-up at the next sync interval to handle the incoming data. Once the sync is finished and the idle period is reached, your service will go back to idling.
As an example, if your sync interval is set to 30 mins and your service idle time is set to 10 mins, Your service will wake-up every 30 mins and be active for 10 mins, then go back to idling.
How are TOAST columns handled in ClickPipes for Postgres?
Please refer to the Handling TOAST Columns page for more information.
How are generated columns handled in ClickPipes for Postgres?
Please refer to the Postgres Generated Columns: Gotchas and Best Practices page for more information.
Do tables need to have primary keys to be part of Postgres CDC?
Yes, for CDC, tables must have either a primary key or a REPLICA IDENTITY. The REPLICA IDENTITY can be set to FULL or configured to use a unique index.
Do you support partitioned tables as part of Postgres CDC?
Yes, partitioned tables are supported out of the box, as long as they have a PRIMARY KEY or REPLICA IDENTITY defined. The PRIMARY KEY and REPLICA IDENTITY must be present on both the parent table and its partitions. You can read more about it here.
Can I connect Postgres databases that don't have a public IP or are in private networks?
Yes! ClickPipes for Postgres offers two ways to connect to databases in private networks:
- 
SSH Tunneling - Works well for most use cases
- See the setup instructions here
- Works across all regions
 
- 
AWS PrivateLink - Available in three AWS regions:
- us-east-1
- us-east-2
- eu-central-1
 
- For detailed setup instructions, see our PrivateLink documentation
- For regions where PrivateLink is not available, please use SSH tunneling
 
- Available in three AWS regions:
How do you handle UPDATEs and DELETEs?
ClickPipes for Postgres captures both INSERTs and UPDATEs from Postgres as new rows with different versions (using the _peerdb_ version column) in ClickHouse. The ReplacingMergeTree table engine periodically performs deduplication in the background based on the ordering key (ORDER BY columns), retaining only the row with the latest _peerdb_ version.
DELETEs from Postgres are propagated as new rows marked as deleted (using the _peerdb_is_deleted column). Since the deduplication process is asynchronous, you might temporarily see duplicates. To address this, you need to handle deduplication at the query layer.
For more details, refer to:
Do you support schema changes?
Please refer to the ClickPipes for Postgres: Schema Changes Propagation Support page for more information.
What are the costs for ClickPipes for Postgres CDC?
During the preview, ClickPipes is free of cost. Post-GA, pricing is still to be determined. The goal is to make the pricing reasonable and highly competitive compared to external ETL tools.
My replication slot size is growing or not decreasing; what might be the issue?
If you're noticing that the size of your Postgres replication slot keeps increasing or isn't coming back down, it usually means that WAL (Write-Ahead Log) records aren't being consumed (or "replayed") quickly enough by your CDC pipeline or replication process. Below are the most common causes and how you can address them.
- 
Sudden Spikes in Database Activity - Large batch updates, bulk inserts, or significant schema changes can quickly generate a lot of WAL data.
- The replication slot will hold these WAL records until they are consumed, causing a temporary spike in size.
 
- 
Long-Running Transactions - An open transaction forces Postgres to keep all WAL segments generated since the transaction began, which can dramatically increase slot size.
- Set statement_timeoutandidle_in_transaction_session_timeoutto reasonable values to prevent transactions from staying open indefinitely:Use this query to identify unusually long-running transactions.
 
- 
Maintenance or Utility Operations (e.g., pg_repack)- Tools like pg_repackcan rewrite entire tables, generating large amounts of WAL data in a short time.
- Schedule these operations during slower traffic periods or monitor your WAL usage closely while they run.
 
- Tools like 
- 
VACUUM and VACUUM ANALYZE - Although necessary for database health, these operations can create extra WAL traffic—especially if they scan large tables.
- Consider using autovacuum tuning parameters or scheduling manual VACUUM operations during off-peak hours.
 
- 
Replication Consumer Not Actively Reading the Slot - If your CDC pipeline (e.g., ClickPipes) or another replication consumer stops, pauses, or crashes, WAL data will accumulate in the slot.
- Ensure your pipeline is continuously running and check logs for connectivity or authentication errors.
 
For an excellent deep dive into this topic, check out our blog post: Overcoming Pitfalls of Postgres Logical Decoding.
How are Postgres data types mapped to ClickHouse?
ClickPipes for Postgres aims to map Postgres data types as natively as possible on the ClickHouse side. This document provides a comprehensive list of each data type and its mapping: Data Type Matrix.
Can I define my own data type mapping while replicating data from Postgres to ClickHouse?
Currently, we don't support defining custom data type mappings as part of the pipe. However, note that the default data type mapping used by ClickPipes is highly native. Most column types in Postgres are replicated as closely as possible to their native equivalents on ClickHouse. Integer array types in Postgres, for instance, are replicated as integer array types on ClickHouse.
How are JSON and JSONB columns replicated from Postgres?
JSON and JSONB columns are replicated as String type in ClickHouse. Since ClickHouse supports a native JSON type, you can create a materialized view over the ClickPipes tables to perform the translation if needed. Alternatively, you can use JSON functions directly on the String column(s). We are actively working on a feature that replicates JSON and JSONB columns directly to the JSON type in ClickHouse. This feature is expected to be available in a few months.
What happens to inserts when a mirror is paused?
When you pause the mirror, the messages are queued up in the replication slot on the source Postgres, ensuring they are buffered and not lost. However, pausing and resuming the mirror will re-establish the connection, which could take some time depending on the source.
During this process, both the sync (pulling data from Postgres and streaming it into the ClickHouse raw table) and normalize (from raw table to target table) operations are aborted. However, they retain the state required to resume durably.
- For sync, if it is canceled mid-way, the confirmed_flush_lsn in Postgres is not advanced, so the next sync will start from the same position as the aborted one, ensuring data consistency.
- For normalize, the ReplacingMergeTree insert order handles deduplication.
In summary, while sync and normalize processes are terminated during a pause, it is safe to do so as they can resume without data loss or inconsistency.
Can ClickPipe creation be automated or done via API or CLI?
A Postgres ClickPipe can also be created and managed via OpenAPI endpoints. This feature is in beta, and the API reference can be found here. We are actively working on Terraform support to create Postgres ClickPipes as well.
How do I speed up my initial load?
You cannot speed up an already running initial load. However, you can optimize future initial loads by adjusting certain settings. By default, the settings are configured with 4 parallel threads and a snapshot number of rows per partition set to 100,000. These are advanced settings and are generally sufficient for most use cases.
For Postgres versions 13 or lower, CTID range scans are slower, and these settings become more critical. In such cases, consider the following process to improve performance:
- Drop the existing pipe: This is necessary to apply new settings.
- Delete destination tables on ClickHouse: Ensure that the tables created by the previous pipe are removed.
- Create a new pipe with optimized settings: Typically, increase the snapshot number of rows per partition to between 1 million and 10 million, depending on your specific requirements and the load your Postgres instance can handle.
These adjustments should significantly enhance the performance of the initial load, especially for older Postgres versions. If you are using Postgres 14 or later, these settings are less impactful due to improved support for CTID range scans.
How should I scope my publications when setting up replication?
You can let ClickPipes manage your publications (requires additional permissions) or create them yourself. With ClickPipes-managed publications, we automatically handle table additions and removals as you edit the pipe. If self-managing, carefully scope your publications to only include tables you need to replicate - including unnecessary tables will slow down Postgres WAL decoding.
If you include any table in your publication, make sure it has either a primary key or REPLICA IDENTITY FULL. If you have tables without a primary key, creating a publication for all tables will cause DELETE and UPDATE operations to fail on those tables.
To identify tables without primary keys in your database, you can use this query:
You have two options when dealing with tables without primary keys:
- 
Exclude tables without primary keys from ClickPipes: Create the publication with only the tables that have a primary key: 
- 
Include tables without primary keys in ClickPipes: If you want to include tables without a primary key, you need to alter their replica identity to FULL. This ensures that UPDATE and DELETE operations work correctly:
If you're creating a publication manually instead of letting ClickPipes manage it, we don't recommend creating a publication FOR ALL TABLES, this leads to more traffic from Postgres to ClickPipes (to sending changes for other tables not in the pipe) and reduces overall efficiency.
For manually created publications, please add any tables you want to the publication before adding them to the pipe.
Recommended max_slot_wal_keep_size Settings
- At Minimum: Set max_slot_wal_keep_sizeto retain at least two days' worth of WAL data.
- For Large Databases (High Transaction Volume): Retain at least 2-3 times the peak WAL generation per day.
- For Storage-Constrained Environments: Tune this conservatively to avoid disk exhaustion while ensuring replication stability.
How to Calculate the Right Value
To determine the right setting, measure the WAL generation rate:
For PostgreSQL 10+:
For PostgreSQL 9.6 and below:
- Run the above query at different times of the day, especially during highly transactional periods.
- Calculate how much WAL is generated per 24-hour period.
- Multiply that number by 2 or 3 to provide sufficient retention.
- Set max_slot_wal_keep_sizeto the resulting value in MB or GB.
Example:
If your database generates 100 GB of WAL per day, set:
My replication slot is invalidated. What should I do?
The only way to recover ClickPipe is by triggering a resync, which you can do in the Settings page.
The most common cause of replication slot invalidation is a low max_slot_wal_keep_size setting on your PostgreSQL database (e.g., a few gigabytes). We recommend increasing this value. Refer to this section on tuning max_slot_wal_keep_size. Ideally, this should be set to at least 200GB to prevent replication slot invalidation.
In rare cases, we have seen this issue occur even when max_slot_wal_keep_size is not configured. This could be due to an intricate and rare bug in PostgreSQL, although the cause remains unclear.
I am seeing Out Of Memory (OOMs) on ClickHouse while my ClickPipe is ingesting data. Can you help?
One common reason for OOMs on ClickHouse is that your service is undersized. This means that your current service configuration doesn't have enough resources (e.g., memory or CPU) to handle the ingestion load effectively. We strongly recommend scaling up the service to meet the demands of your ClickPipe data ingestion.
Another reason we've observed is the presence of downstream Materialized Views with potentially unoptimized joins:
- 
A common optimization technique for JOINs is if you have a LEFT JOINwhere the right-hand side table is very large. In this case, rewrite the query to use aRIGHT JOINand move the larger table to the left-hand side. This allows the query planner to be more memory efficient.
- 
Another optimization for JOINs is to explicitly filter the tables through subqueriesorCTEsand then perform theJOINacross these subqueries. This provides the planner with hints on how to efficiently filter rows and perform theJOIN.
I am seeing an invalid snapshot identifier during the initial load. What should I do?
The invalid snapshot identifier error occurs when there is a connection drop between ClickPipes and your Postgres database. This can happen due to gateway timeouts, database restarts, or other transient issues.
It is recommended that you do not carry out any disruptive operations like upgrades or restarts on your Postgres database while Initial Load is in progress and ensure that the network connection to your database is stable.
To resolve this issue, you can trigger a resync from the ClickPipes UI. This will restart the initial load process from the beginning.
What happens if I drop a publication in Postgres?
Dropping a publication in Postgres will break your ClickPipe connection since the publication is required for the ClickPipe to pull changes from the source. When this happens, you'll typically receive an error alert indicating that the publication no longer exists.
To recover your ClickPipe after dropping a publication:
- Create a new publication with the same name and required tables in Postgres
- Click the 'Resync tables' button in the Settings tab of your ClickPipe
This resync is necessary because the recreated publication will have a different Object Identifier (OID) in Postgres, even if it has the same name. The resync process refreshes your destination tables and restores the connection.
Alternatively, you can create an entirely new pipe if preferred.
Note that if you're working with partitioned tables, make sure to create your publication with the appropriate settings:
What if I am seeing Unexpected Datatype errors or Cannot parse type XX ...
This error typically occurs when the source Postgres database has a datatype which cannot be mapped during ingestion. For more specific issue, refer to the possibilities below.
Cannot parse type Decimal(XX, YY), expected non-empty binary data with size equal to or less than ...
Postgres NUMERICs have really high precision (up to 131072 digits before the decimal point; up to 16383 digits after the decimal point) and ClickHouse Decimal type allows maximum of (76 digits, 39 scale).
The system assumes that usually the size would not get that high and does an optimistic cast for the same as source table can have large number of rows or the row can come in during the CDC phase.
The current workaround would be to map the NUMERIC type to string on ClickHouse. To enable this please raise a ticket with the support team and this will be enabled for your ClickPipes.
