In today’s open-source ecosystem, organizations are increasingly adopting open-source technologies to reduce licensing costs, avoid vendor lock-in, and improve flexibility and scalability. A common modernization initiative is migrating from proprietary databases such as Oracle to open-source platforms like PostgreSQL.
Even database migration ensuring business continuity, data integrity — Minimal downtime is being a major concern. Any disruption to production systems can affect operations, customer experience, and revenue. Therefore, a carefully planned migration strategy is essential to move data seamlessly from the source to the target database without data loss or interruption to ongoing transactions.
This CDC approach leverages fully open-source tools to achieve a reliable, near-zero downtime transition. We can consider using Ora2pg for the initial full load, managing schema conversion, data extraction, and bulk data transfer from Oracle to PostgreSQL. After the initial load, Change Data Capture (CDC) is implemented using Debezium integrated with Apache Kafka to stream real-time database changes such as inserts, updates, and deletes to PostgreSQL.
By combining a full data load with CDC, organizations can ensure continuous synchronization, maintain data consistency, and execute a smooth cutover from Oracle to PostgreSQL with minimal business impact.
Core Concepts & Technologies
What is CDC (Change Data Capture)?
Change Data Capture (CDC) is a design pattern that tracks and captures changes made to a database such as row inserts, updates, and deletes in real time, and streams those changes to downstream systems. The architecture to achieve CDC from Oracle to PostgreSQL is

What is Debezium?
Debezium is an open-source distributed platform for Change Data Capture. It monitors your databases and streams row-level changes to Apache Kafka topics. Debezium supports multiple databases including Oracle, PostgreSQL, MySQL, MongoDB, and more.
Debezium works as a Kafka Connect connector. It provides:
- Oracle Connector: Reads Oracle’s Redo/Archive Logs using LogMiner.
- JDBC Sink Connector: Writes CDC events from Kafka topics into the target database (PostgreSQL).
Key advantages of Debezium:
- Captures all changes:
INSERT,UPDATEandDELETE. - Low latency: Changes streamed in milliseconds.
- Fault tolerant: Resumes from last position after failure.
- Schema aware: Understands table structures and handles schema changes.
What is Apache Kafka?
Apache Kafka is a distributed event streaming platform designed to handle high-throughput, fault-tolerant, real-time data pipelines. In this migration, Kafka acts as the message bus between Oracle (source) and PostgreSQL (target).
Think of Kafka as a high-speed, durable conveyor belt: Debezium puts change events onto the belt (topics), and the JDBC Sink connector picks them up and applies them to PostgreSQL.
Key Kafka concepts in this setup:
- Topics: Named channels where change events are published (e.g.,
oracle.HR.EMPLOYEES). - Producers: Debezium Oracle Connector reads Oracle logs and publishes events.
- Consumers: Debezium JDBC Sink Connector reads events and writes to PostgreSQL.
- Brokers: Kafka server that stores and routes messages.
What is Zookeeper?
Apache Zookeeper is a centralized coordination service that Kafka uses to manage cluster metadata, leader election, and broker registration. In simple terms, Zookeeper is the ‘orchestrator’ that keeps track of Kafka’s internal state.
Prerequisites
Environment Details
Here is the environment used along with port to implement CDC.
- Host OS: Ubuntu 24.04
- Oracle Database: Enterprise Edition 21.3.0.0 (Container: oracle-db)
- PostgreSQL: Version 17 (Port: 5432)
- Kafka: wurstmeister/kafka:latest (Port: 9092)
- Zookeeper: 3.8 (Port: 2181)
- Debezium Connect: 2.2 (Port: 8083)
- Kafka UI: provectuslabs/kafka-ui (Port: 8080)
Network Configuration
All containers are on the bridge network: Monitoring
- Oracle DB: 172.18.0.2
- Debezium: 172.18.0.6
- Kafka: accessible via hostname kafka
- Monitoring Kafka UI
- Zookeeper: accessible via hostname zookeeper
Oracle Database conifguration
Oracle CDC via LogMiner requires Archive Log Mode to be enabled. This ensures all transaction changes are persisted in archive log files for Debezium to read.
SQL> ARCHIVE LOG LIST;
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination /opt/oracle/homes/OraDB21Home1/dbs/arch
Oldest online log sequence 3
Current log sequence 5
SQL> SHUTDOWN IMMEDIATE;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP MOUNT;
ORACLE instance started.
Total System Global Area 8589933520 bytes
Fixed Size 9706448 bytes
Variable Size 1375731712 bytes
Database Buffers 7197425664 bytes
Redo Buffers 7069696 bytes
Database mounted.
SQL> ALTER DATABASE ARCHIVELOG;
Database altered.
SQL> ARCHIVE LOG LIST;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /opt/oracle/homes/OraDB21Home1/dbs/arch
Oldest online log sequence 3
Next log sequence to archive 5
Current log sequence 5
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Database altered.
SQL>
SQL> SELECT SUPPLEMENTAL_LOG_DATA_MIN FROM V$DATABASE;
SUPPLEME
--------
YES
SQL> ALTER SESSION SET CONTAINER=ORCLPDB1;
Session altered.
SQL> SHOW CON_NAME;
CON_NAME
------------------------------
ORCLPDB1
ORACLE SETUP FOR DEBEZIUM CDC
Let’s run the Machine -1 where Oracle is running
#Connect as SYSDBA to the CDB (Container Database)
sqlplus sys/your_password@//localhost:1521/ORCLCDB as sysdba
-- Step 1: Enable ARCHIVELOG mode (if not already enabled)
-- Check current mode:
-- SELECT LOG_MODE FROM V$DATABASE;
-- If NOARCHIVELOG, you need to restart in mount mode:
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
-- Step 2: Enable supplemental logging at database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
-- Step 3: Create the Debezium capture user (Common User in CDB)
CREATE USER c##debezium IDENTIFIED BY dbz
DEFAULT TABLESPACE USERS
QUOTA UNLIMITED ON USERS
CONTAINER=ALL;
-- Step 4: Grant required privileges
GRANT CREATE SESSION TO c##debezium CONTAINER=ALL;
GRANT SET CONTAINER TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$DATABASE TO c##debezium CONTAINER=ALL;
GRANT FLASHBACK ANY TABLE TO c##debezium CONTAINER=ALL;
GRANT SELECT ANY TABLE TO c##debezium CONTAINER=ALL;
GRANT SELECT_CATALOG_ROLE TO c##debezium CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO c##debezium CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO c##debezium CONTAINER=ALL;
-- LogMiner specific privileges
GRANT LOGMINING TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOG_HISTORY TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_PARAMETERS TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO c##debezium CONTAINER=ALL;
-- Step 5: Enable supplemental logging for HR schema tables in PDB
ALTER SESSION SET CONTAINER = ORCLPDB1;
ALTER TABLE HR.COUNTRIES ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.DEPARTMENTS ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.EMPLOYEES ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.JOBS ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.JOB_HISTORY ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.LOCATIONS ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.REGIONS ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
-- Verify supplemental logging is enabled
SELECT SUPPLEMENTAL_LOG_DATA_MIN, SUPPLEMENTAL_LOG_DATA_ALL FROM V$DATABASE;
User creation in PostgreSQL
Parallelly create the user in PostgreSQL
-- ============================================================
-- POSTGRESQL SETUP FOR DEBEZIUM CDC (Run on Machine-2 Postgres)
-- ============================================================
-- Connect as: psql -U postgres
-- ============================================================
-- Step 1: Create the debezium_user
CREATE USER debezium_user WITH PASSWORD 'postgres';
-- Step 2: Grant connect to the cdc_target database
GRANT CONNECT ON DATABASE cdc_target TO debezium_user;
-- Step 3: Switch to cdc_target database
-- \c cdc_target
-- Step 4: Grant usage on hr schema
GRANT USAGE ON SCHEMA hr TO debezium_user;
-- Step 5: Grant DML privileges on all existing tables in hr schema
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA hr TO debezium_user;
-- Step 6: Grant privileges on future tables created in hr schema
ALTER DEFAULT PRIVILEGES IN SCHEMA hr
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO debezium_user;
-- Step 7: Grant usage on sequences (if any)
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA hr TO debezium_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA hr
GRANT USAGE, SELECT ON SEQUENCES TO debezium_user;
Configure PostgreSQL Network Access
Change need to be done in postgresql.conf file
listen_addresses = '*'
Additions to pg_hba.conf file
# Docker networks
host all all 172.17.0.0/16 md5
host all all 172.18.0.0/16 md5
host all all 172.31.0.0/16 md5
host cdc_target debezium_user 172.31.38.9/32 md5
Restart the PostgreSQL services
sudo systemctl restart postgresql
Kafka & Debezium Setup
Install the Docker compose
mkdir -p ~/debezium-cdc
cd ~/debezium-cdc
sudo apt install docker-compose
Create Docker Compose File
Use below script in Docker Compose file sets up a complete CDC (Change Data Capture) infrastructure with:
- Message broker (Kafka)
- Coordination service (Zookeeper)
- Connector framework (Kafka Connect with Debezium)
- Monitoring UI (Kafka UI)
nano docker-compose.yml
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
container_name: zookeeper
hostname: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
volumes:
- zookeeper-data:/var/lib/zookeeper/data
- zookeeper-logs:/var/lib/zookeeper/log
networks:
- bridge
restart: unless-stopped
kafka:
image: confluentinc/cp-kafka:7.5.0
container_name: kafka
hostname: kafka
depends_on:
- zookeeper
ports:
- "9092:9092"
- "29092:29092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://172.31.7.166:9092
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:29092,PLAINTEXT_HOST://0.0.0.0:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
volumes:
- kafka-data:/var/lib/kafka/data
networks:
- bridge
restart: unless-stopped
kafka-connect:
build:
context: .
dockerfile: Dockerfile
container_name: kafka-connect
hostname: kafka-connect
depends_on:
- kafka
ports:
- "8083:8083"
environment:
BOOTSTRAP_SERVERS: kafka:29092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: connect_configs
OFFSET_STORAGE_TOPIC: connect_offsets
STATUS_STORAGE_TOPIC: connect_statuses
CONFIG_STORAGE_REPLICATION_FACTOR: 1
OFFSET_STORAGE_REPLICATION_FACTOR: 1
STATUS_STORAGE_REPLICATION_FACTOR: 1
KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE: 'true'
CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE: 'true'
networks:
- cybertec_migrator_common
restart: unless-stopped
kafka-ui:
image: provectuslabs/kafka-ui:latest
container_name: kafka-ui
depends_on:
- kafka
- kafka-connect
ports:
- "8080:8080"
environment:
KAFKA_CLUSTERS_0_NAME: debezium-cluster
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092
KAFKA_CLUSTERS_0_ZOOKEEPER: zookeeper:2181
KAFKA_CLUSTERS_0_KAFKACONNECT_0_NAME: debezium-connect
KAFKA_CLUSTERS_0_KAFKACONNECT_0_ADDRESS: http://kafka-connect:8083
networks:
- cybertec_migrator_common
restart: unless-stopped
networks:
cybertec_migrator_common:
external: true
volumes:
zookeeper-data:
zookeeper-logs:
kafka-data:
zookeeper-logs:
kafka-data:
Install Oracle JDBC Driver
Use the below Dockerfile customizes the Debezium Connect image by adding necessary JDBC drivers so that Debezium can:
- Connect to Oracle (source database)
- Connect to PostgreSQL (target database)
Without these drivers, Debezium cannot communicate with your databases.
cat > ~/debezium-cdc/Dockerfile << 'EOF'
FROM quay.io/debezium/connect:2.5
USER root
RUN cd /kafka/connect/debezium-connector-oracle && \
curl -fSL -o ojdbc8.jar \
"https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.11.0.0/ojdbc8-21.11.0.0.jar"
RUN cd /kafka/connect/debezium-connector-jdbc && \
curl -fSL -o postgresql.jar \
"https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.1/postgresql-42.7.1.jar"
USER 1001
EOF
Start Kafka Stack
docker-compose up -d
# Wait for services to start
Oracle sink connector
Create Oracle Source Connector & configure
nano oracle-connector.json
{
"name": "oracle-hr-connector",
"config": {
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"tasks.max": "1",
"database.hostname": "oracle-db",
"database.port": "1521",
"database.user": "c##debezium",
"database.password": "dbz",
"database.dbname": "ORCLCDB",
"database.pdb.name": "ORCLPDB1",
"topic.prefix": "oracle",
"schema.include.list": "HR",
"database.history.kafka.bootstrap.servers": "kafka:29092",
"database.history.kafka.topic": "schema-changes.hr",
"log.mining.strategy": "online_catalog",
"log.mining.continuous.mine": "true",
"decimal.handling.mode": "string",
"snapshot.mode": "schema_only",
"include.schema.changes": "true"
}
}
Configuration Explained:
- snapshot.mode: initial – Captures existing data first
- decimal.handling.mode: double – Converts Oracle NUMBER to PostgreSQL double precision
- log.mining.strategy: online_catalog – Uses LogMiner with online catalog
Let’s register the oracle sink connector
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d @oracle-connector.json
PostgreSQL sink connector
Let’s configure the PostgreSQL sink connector
nano postgres-sink-connector.json
{
"name": "postgres-hr-sink",
"config": {
"connector.class": "io.debezium.connector.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:postgresql://172.31.7.166:5432/cdc_target",
"connection.username": "debezium_user",
"connection.password": "postgres",
"topics.regex": "oracle\\.HR\\..*",
"insert.mode": "upsert",
"primary.key.mode": "record_key",
"delete.enabled": "true",
"schema.evolution": "basic",
"table.name.format": "hr.${topic}",
"quote.identifiers": "false",
"transforms": "unwrap,route",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "none",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "oracle\\.HR\\.(.*)",
"transforms.route.replacement": "$1"
}
}
Configuration Explained:
- topics.regex – Matches all HR tables: oracle.HR.*
- insert.mode: upsert – INSERT or UPDATE based on primary key
- transforms.route – Strips topic prefix to get table name
schema.evolution: basic – Auto-creates/updates table schema
Register the PostgreSQL sink connector
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d @postgres-sink-connector.json
Verify Connector Status
#For Oracle (Source)
curl -s http://localhost:8083/connectors/oracle-hr-connector/status | python3 -m json.tool
#For PostgreSQL (Target)
curl -s http://localhost:8083/connectors/postgres-hr-sink/status | python3 -m json.tool
Conclusion
Migrating from Oracle to PostgreSQL doesn’t have to mean extended downtime or risky data transfers. By combining Ora2pg for initial schema and data migration with a robust CDC pipeline built on Debezium and Apache Kafka, organizations can achieve near-zero downtime transitions while maintaining data consistency throughout the process.
In this guide, we’ve walked through the complete setup of an open-source CDC infrastructure, from enabling Oracle’s archive log mode and configuring LogMiner to deploying Kafka Connect with Debezium connectors. The architecture we’ve built captures real-time changes from Oracle using Debezium’s Oracle connector, streams them through Kafka topics, and applies them to PostgreSQL using the JDBC sink connector—all with minimal latency and automatic schema evolution support.
With your connectors now registered and running, the CDC pipeline is actively monitoring Oracle’s transaction logs and replicating changes to PostgreSQL in real time. In Part 2, we’ll dive into comprehensive testing scenarios including:
- Validating data synchronization across
INSERT,UPDATE, andDELETEoperations - Monitoring CDC lag and throughput using Kafka UI
- Handling edge cases like schema changes and bulk operations
- Performance tuning and troubleshooting common issues
- Executing the final cutover with confidence
The foundation is set—now it’s time to put this CDC pipeline through its paces and ensure your migration will be seamless when it matters most.
See this in action at PGConf India 2026 – Top mistakes when migrating from Oracle and SQL Server to PostgreSQL presented by Jim Mlodgenski.
