Oracle to PostgreSQL Migration: CDC with Debezium & Apache Kafka

In today’s open-source ecosystem, organizations are increasingly adopting open-source technologies to reduce licensing costs, avoid vendor lock-in, and improve flexibility and scalability. A common modernization initiative is migrating from proprietary databases such as Oracle to open-source platforms like PostgreSQL.

Even database migration ensuring business continuity, data integrity — Minimal downtime is being a major concern. Any disruption to production systems can affect operations, customer experience, and revenue. Therefore, a carefully planned migration strategy is essential to move data seamlessly from the source to the target database without data loss or interruption to ongoing transactions.

This CDC approach leverages fully open-source tools to achieve a reliable, near-zero downtime transition. We can consider using Ora2pg for the initial full load, managing schema conversion, data extraction, and bulk data transfer from Oracle to PostgreSQL. After the initial load, Change Data Capture (CDC) is implemented using Debezium integrated with Apache Kafka to stream real-time database changes such as inserts, updates, and deletes to PostgreSQL.

By combining a full data load with CDC, organizations can ensure continuous synchronization, maintain data consistency, and execute a smooth cutover from Oracle to PostgreSQL with minimal business impact.

Core Concepts & Technologies

What is CDC (Change Data Capture)?

Change Data Capture (CDC) is a design pattern that tracks and captures changes made to a database such as row inserts, updates, and deletes  in real time, and streams those changes to downstream systems. The architecture to achieve CDC from Oracle to PostgreSQL is

What is Debezium?

Debezium is an open-source distributed platform for Change Data Capture. It monitors your databases and streams row-level changes to Apache Kafka topics. Debezium supports multiple databases including Oracle, PostgreSQL, MySQL, MongoDB, and more.

Debezium works as a Kafka Connect connector. It provides:

  • Oracle Connector: Reads Oracle’s Redo/Archive Logs using LogMiner.
  • JDBC Sink Connector: Writes CDC events from Kafka topics into the target database (PostgreSQL).

Key advantages of Debezium:

  • Captures all changes: INSERT, UPDATE and DELETE.
  • Low latency: Changes streamed in milliseconds.
  • Fault tolerant: Resumes from last position after failure.
  • Schema aware: Understands table structures and handles schema changes.

What is Apache Kafka?

Apache Kafka is a distributed event streaming platform designed to handle high-throughput, fault-tolerant, real-time data pipelines. In this migration, Kafka acts as the message bus between Oracle (source) and PostgreSQL (target).

Think of Kafka as a high-speed, durable conveyor belt: Debezium puts change events onto the belt (topics), and the JDBC Sink connector picks them up and applies them to PostgreSQL.

Key Kafka concepts in this setup:

  • Topics: Named channels where change events are published (e.g., oracle.HR.EMPLOYEES).
  • Producers: Debezium Oracle Connector reads Oracle logs and publishes events.
  • Consumers: Debezium JDBC Sink Connector  reads events and writes to PostgreSQL.
  • Brokers: Kafka server that stores and routes messages.

What is Zookeeper?

Apache Zookeeper is a centralized coordination service that Kafka uses to manage cluster metadata, leader election, and broker registration. In simple terms, Zookeeper is the ‘orchestrator’ that keeps track of Kafka’s internal state.

Prerequisites

Environment Details

Here is the environment used along with port to implement CDC.

  • Host OS: Ubuntu 24.04
  • Oracle Database: Enterprise Edition 21.3.0.0 (Container: oracle-db)
  • PostgreSQL: Version 17 (Port: 5432)
  • Kafka: wurstmeister/kafka:latest (Port: 9092)
  • Zookeeper: 3.8 (Port: 2181)
  • Debezium Connect: 2.2 (Port: 8083)
  • Kafka UI: provectuslabs/kafka-ui (Port: 8080)

Network Configuration

All containers are on the bridge network: Monitoring 

  • Oracle DB: 172.18.0.2
  • Debezium: 172.18.0.6
  • Kafka: accessible via hostname kafka
  • Monitoring  Kafka UI 
  • Zookeeper: accessible via hostname zookeeper

Oracle Database conifguration

Oracle CDC via LogMiner requires Archive Log Mode to be enabled. This ensures all transaction changes are persisted in archive log files for Debezium to read.

SQL> ARCHIVE LOG LIST;
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            /opt/oracle/homes/OraDB21Home1/dbs/arch
Oldest online log sequence     3
Current log sequence           5
SQL> SHUTDOWN IMMEDIATE;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP MOUNT;
ORACLE instance started.

Total System Global Area 8589933520 bytes
Fixed Size                  9706448 bytes
Variable Size            1375731712 bytes
Database Buffers         7197425664 bytes
Redo Buffers                7069696 bytes
Database mounted.
SQL> ALTER DATABASE ARCHIVELOG;

Database altered.

SQL> ARCHIVE LOG LIST;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /opt/oracle/homes/OraDB21Home1/dbs/arch
Oldest online log sequence     3
Next log sequence to archive   5
Current log sequence           5
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Database altered.
SQL>
SQL> SELECT SUPPLEMENTAL_LOG_DATA_MIN FROM V$DATABASE;
SUPPLEME
--------
YES
SQL> ALTER SESSION SET CONTAINER=ORCLPDB1;
Session altered.
SQL> SHOW CON_NAME;
CON_NAME
------------------------------
ORCLPDB1

ORACLE SETUP FOR DEBEZIUM CDC

Let’s run the Machine -1 where Oracle is running

#Connect as SYSDBA to the CDB (Container Database)
sqlplus sys/your_password@//localhost:1521/ORCLCDB as sysdba

-- Step 1: Enable ARCHIVELOG mode (if not already enabled)
-- Check current mode:
-- SELECT LOG_MODE FROM V$DATABASE;
-- If NOARCHIVELOG, you need to restart in mount mode:
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;

-- Step 2: Enable supplemental logging at database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

-- Step 3: Create the Debezium capture user (Common User in CDB)
CREATE USER c##debezium IDENTIFIED BY dbz
  DEFAULT TABLESPACE USERS
  QUOTA UNLIMITED ON USERS
  CONTAINER=ALL;

-- Step 4: Grant required privileges
GRANT CREATE SESSION TO c##debezium CONTAINER=ALL;
GRANT SET CONTAINER TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$DATABASE TO c##debezium CONTAINER=ALL;
GRANT FLASHBACK ANY TABLE TO c##debezium CONTAINER=ALL;
GRANT SELECT ANY TABLE TO c##debezium CONTAINER=ALL;
GRANT SELECT_CATALOG_ROLE TO c##debezium CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO c##debezium CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO c##debezium CONTAINER=ALL;
-- LogMiner specific privileges
GRANT LOGMINING TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOG_HISTORY TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_PARAMETERS TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO c##debezium CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO c##debezium CONTAINER=ALL;

-- Step 5: Enable supplemental logging for HR schema tables in PDB
ALTER SESSION SET CONTAINER = ORCLPDB1;
ALTER TABLE HR.COUNTRIES ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.DEPARTMENTS ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.EMPLOYEES ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.JOBS ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.JOB_HISTORY ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.LOCATIONS ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE HR.REGIONS ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

-- Verify supplemental logging is enabled
SELECT SUPPLEMENTAL_LOG_DATA_MIN, SUPPLEMENTAL_LOG_DATA_ALL FROM V$DATABASE;

User creation in PostgreSQL

Parallelly create the user in PostgreSQL

-- ============================================================
-- POSTGRESQL SETUP FOR DEBEZIUM CDC (Run on Machine-2 Postgres)
-- ============================================================
-- Connect as: psql -U postgres
-- ============================================================

-- Step 1: Create the debezium_user
CREATE USER debezium_user WITH PASSWORD 'postgres';

-- Step 2: Grant connect to the cdc_target database
GRANT CONNECT ON DATABASE cdc_target TO debezium_user;

-- Step 3: Switch to cdc_target database
-- \c cdc_target

-- Step 4: Grant usage on hr schema
GRANT USAGE ON SCHEMA hr TO debezium_user;

-- Step 5: Grant DML privileges on all existing tables in hr schema
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA hr TO debezium_user;

-- Step 6: Grant privileges on future tables created in hr schema
ALTER DEFAULT PRIVILEGES IN SCHEMA hr
  GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO debezium_user;

-- Step 7: Grant usage on sequences (if any)
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA hr TO debezium_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA hr
  GRANT USAGE, SELECT ON SEQUENCES TO debezium_user;

Configure PostgreSQL Network Access

Change need to be done in postgresql.conf file

listen_addresses = '*'

Additions to pg_hba.conf file

# Docker networks
host    	all           	 all         	172.17.0.0/16           md5
host    	all              all           	172.18.0.0/16           md5
host    	all              all           	172.31.0.0/16           md5
host         cdc_target    debezium_user        172.31.38.9/32          md5

Restart the PostgreSQL services

sudo systemctl restart postgresql

Kafka & Debezium Setup

Install the Docker compose

mkdir -p ~/debezium-cdc
cd ~/debezium-cdc
sudo apt install docker-compose

Create Docker Compose File

Use below script in  Docker Compose file sets up a complete CDC (Change Data Capture) infrastructure with:

  • Message broker (Kafka)
  • Coordination service (Zookeeper)
  • Connector framework (Kafka Connect with Debezium)
  • Monitoring UI (Kafka UI)
nano docker-compose.yml

version: '3.8'

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    container_name: zookeeper
    hostname: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    volumes:
      - zookeeper-data:/var/lib/zookeeper/data
      - zookeeper-logs:/var/lib/zookeeper/log
    networks:
      - bridge
    restart: unless-stopped

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    container_name: kafka
    hostname: kafka
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
      - "29092:29092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://172.31.7.166:9092
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:29092,PLAINTEXT_HOST://0.0.0.0:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
    volumes:
      - kafka-data:/var/lib/kafka/data
    networks:
      - bridge
    restart: unless-stopped

  kafka-connect:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: kafka-connect
    hostname: kafka-connect
    depends_on:
      - kafka
    ports:
      - "8083:8083"
    environment:
      BOOTSTRAP_SERVERS: kafka:29092
      GROUP_ID: 1
      CONFIG_STORAGE_TOPIC: connect_configs
      OFFSET_STORAGE_TOPIC: connect_offsets
      STATUS_STORAGE_TOPIC: connect_statuses
      CONFIG_STORAGE_REPLICATION_FACTOR: 1
      OFFSET_STORAGE_REPLICATION_FACTOR: 1
      STATUS_STORAGE_REPLICATION_FACTOR: 1
      KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE: 'true'
      CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE: 'true'
    networks:
      - cybertec_migrator_common
    restart: unless-stopped

  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    container_name: kafka-ui
    depends_on:
      - kafka
      - kafka-connect
    ports:
      - "8080:8080"
    environment:
      KAFKA_CLUSTERS_0_NAME: debezium-cluster
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092
      KAFKA_CLUSTERS_0_ZOOKEEPER: zookeeper:2181
      KAFKA_CLUSTERS_0_KAFKACONNECT_0_NAME: debezium-connect
      KAFKA_CLUSTERS_0_KAFKACONNECT_0_ADDRESS: http://kafka-connect:8083
    networks:
      - cybertec_migrator_common
    restart: unless-stopped

networks:
  cybertec_migrator_common:
    external: true

volumes:
  zookeeper-data:
  zookeeper-logs:
  kafka-data:

  zookeeper-logs:
  kafka-data:

Install Oracle JDBC Driver

Use the below Dockerfile customizes the Debezium Connect image by adding necessary JDBC drivers so that Debezium can:

  1. Connect to Oracle (source database)
  2. Connect to PostgreSQL (target database)

Without these drivers, Debezium cannot communicate with your databases.

cat > ~/debezium-cdc/Dockerfile << 'EOF'
FROM quay.io/debezium/connect:2.5

USER root

RUN cd /kafka/connect/debezium-connector-oracle && \
    curl -fSL -o ojdbc8.jar \
    "https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.11.0.0/ojdbc8-21.11.0.0.jar"

RUN cd /kafka/connect/debezium-connector-jdbc && \
    curl -fSL -o postgresql.jar \
    "https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.1/postgresql-42.7.1.jar"

USER 1001
EOF

Start Kafka Stack

docker-compose up -d
# Wait for services to start

Oracle sink connector

Create Oracle Source Connector & configure

nano oracle-connector.json

{
  "name": "oracle-hr-connector",
  "config": {
    "connector.class": "io.debezium.connector.oracle.OracleConnector",
    "tasks.max": "1",
    "database.hostname": "oracle-db",
    "database.port": "1521",
    "database.user": "c##debezium",
    "database.password": "dbz",
    "database.dbname": "ORCLCDB",
    "database.pdb.name": "ORCLPDB1",
    "topic.prefix": "oracle",
    "schema.include.list": "HR",
    "database.history.kafka.bootstrap.servers": "kafka:29092",
    "database.history.kafka.topic": "schema-changes.hr",
    "log.mining.strategy": "online_catalog",
    "log.mining.continuous.mine": "true",
    "decimal.handling.mode": "string",
    "snapshot.mode": "schema_only",
    "include.schema.changes": "true"
  }
}

Configuration Explained:

  • snapshot.mode: initial – Captures existing data first
  • decimal.handling.mode: double – Converts Oracle NUMBER to PostgreSQL double precision
  • log.mining.strategy: online_catalog – Uses LogMiner with online catalog

Let’s register the oracle sink connector

curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d @oracle-connector.json

PostgreSQL sink connector

Let’s configure the PostgreSQL sink connector

nano postgres-sink-connector.json
{
    "name": "postgres-hr-sink",
    "config": {
        "connector.class": "io.debezium.connector.jdbc.JdbcSinkConnector",
        "tasks.max": "1",
        "connection.url": "jdbc:postgresql://172.31.7.166:5432/cdc_target",
        "connection.username": "debezium_user",
        "connection.password": "postgres",
        "topics.regex": "oracle\\.HR\\..*",
        "insert.mode": "upsert",
        "primary.key.mode": "record_key",
        "delete.enabled": "true",
        "schema.evolution": "basic",
        "table.name.format": "hr.${topic}",
        "quote.identifiers": "false",
        "transforms": "unwrap,route",
        "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
        "transforms.unwrap.drop.tombstones": "false",
        "transforms.unwrap.delete.handling.mode": "none",
        "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
        "transforms.route.regex": "oracle\\.HR\\.(.*)",
        "transforms.route.replacement": "$1"
    }
}

Configuration Explained:

  • topics.regex – Matches all HR tables: oracle.HR.*
  • insert.mode: upsert – INSERT or UPDATE based on primary key
  • transforms.route – Strips topic prefix to get table name

schema.evolution: basic – Auto-creates/updates table schema

Register the PostgreSQL sink connector

curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d @postgres-sink-connector.json

Verify Connector Status

#For Oracle (Source)
curl -s http://localhost:8083/connectors/oracle-hr-connector/status | python3 -m json.tool

#For PostgreSQL (Target)
curl -s http://localhost:8083/connectors/postgres-hr-sink/status | python3 -m json.tool

Conclusion

Migrating from Oracle to PostgreSQL doesn’t have to mean extended downtime or risky data transfers. By combining Ora2pg for initial schema and data migration with a robust CDC pipeline built on Debezium and Apache Kafka, organizations can achieve near-zero downtime transitions while maintaining data consistency throughout the process.

In this guide, we’ve walked through the complete setup of an open-source CDC infrastructure, from enabling Oracle’s archive log mode and configuring LogMiner to deploying Kafka Connect with Debezium connectors. The architecture we’ve built captures real-time changes from Oracle using Debezium’s Oracle connector, streams them through Kafka topics, and applies them to PostgreSQL using the JDBC sink connector—all with minimal latency and automatic schema evolution support.

With your connectors now registered and running, the CDC pipeline is actively monitoring Oracle’s transaction logs and replicating changes to PostgreSQL in real time. In Part 2, we’ll dive into comprehensive testing scenarios including:

  • Validating data synchronization across INSERT, UPDATE, and DELETE operations
  • Monitoring CDC lag and throughput using Kafka UI
  • Handling edge cases like schema changes and bulk operations
  • Performance tuning and troubleshooting common issues
  • Executing the final cutover with confidence

The foundation is set—now it’s time to put this CDC pipeline through its paces and ensure your migration will be seamless when it matters most.

See this in action at PGConf India 2026 – Top mistakes when migrating from Oracle and SQL Server to PostgreSQL presented by Jim Mlodgenski.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top