Blog

Performance tuning with Apache Spark – Introduction

Introduction: Welcome back to our ongoing series on Data transformation with Apache Spark! In our previous posts, we’ve covered essential topics like setting up Apache Spark on Ubuntu, integrating data with Spark, and querying datasets using Apache Drill. Now, we’re … Read More

Using pg_index_watch for PostgreSQL Indexing

Let’s delve into exploring pg_index_watch. In this instalment, I will guide you through the rationale behind its creation and explain its operational nuances. Meet pg_index_watch – a utility for automagical rebuild of bloated indexes, an absolutely handly tool designed to … Read More

PostgreSQL User Management: Best Practices & Security Nuances

PostgreSQL – The World’s Most Advanced Open Source Relational Database and stands out as a robust and feature-rich solution, offering extensive capabilities for user management. Effective user management is important for ensuring data security, integrity, and accessibility within the platform. … Read More

Open-source Data Engineering with PostgreSQL

Blog-4: Apache Drill Magic across PostgreSQL, Local Parquet, and S3 INTRODUCTION: Welcome back! Following our exploration of data movement between PostgreSQL and Amazon S3 in the previous blog, we now venture into the realm of querying with Apache Drill. In … Read More

Open-source Data Engineering with PostgreSQL

Blog-3: Data Loading with Apache Spark INTRODUCTION: Welcome to the next installment of our series on Open-source Data Engineering with PostgreSQL. In this blog, we’ll delve into the practicalities of transforming table data from PostgreSQL into the Parquet format and … Read More

Open-source Data Engineering with PostgreSQL

Blog-2: Installation and Setup on Ubuntu INTRODUCTION: Welcome back to the series on Open-source Data Engineering with PostgreSQL. In this post, we shall delve into the installation and configuration of Apache Spark and Apache Drill on an Ubuntu environment. Our … Read More

Open-source Data Engineering with PostgreSQL

Overview – A Curtain raiser Introduction: In the ever-evolving landscape of Data management, organizations are constantly seeking efficient ways to handle, transform, and query massive datasets. Data Archiving has become an important component of Data Engineering in the ever-evolving landscape … Read More

Mastering Timestamp-Based CDC Hurdles: Solution Implementation

Introduction In the execution phase of mastering Timestamp-Based Change Data Capture (CDC) hurdles, the focus lies on implementing INSERT, DELETE, and UPDATE operations in our PostgreSQL database and using the proven solution we discussed in the previous blog to achieve … Read More

Mastering Timestamp-Based CDC Hurdles: A Proven Solution

Introduction Have you experimented with Timestamp-Based Change Data Capture using the Pentaho Data Integration (PDI) tool? Achieving data replication from a source database to a target database through “Timestamp-Based Change Data Capture” with Pentaho Data Integration is indeed straightforward. Perhaps … Read More

Timestamp-based Change Data Capture

Introduction Hey all!! Hope you are following the series of topics around Data Integration with PostgreSQL, The World’s Most Advanced Open Source Relational Database. In our previous blog, we explored the Change Data Capture (CDC) and its methods. If you … Read More