Brahmini Ratnam - 20/02/2024

Performance tuning with Apache Spark – Introduction

Introduction: Welcome back to our ongoing series on Data transformation with Apache Spark! In our previous posts, we’ve covered essential topics like setting up Apache Spark on Ubuntu, integrating data with Spark, and querying datasets using Apache Drill. Now, we’re …

Read More
Taraka Vuyyuru - 14/02/2024

Using pg_index_watch for PostgreSQL Indexing

Let’s delve into exploring pg_index_watch. In this instalment, I will guide you through the rationale behind its creation and explain its operational nuances. Meet pg_index_watch – a utility for automagical rebuild of bloated indexes, an absolutely handly tool designed to …

Read More
Taraka Vuyyuru - 07/02/2024

PostgreSQL User Management: Best Practices & Security Nuances

PostgreSQL – The World’s Most Advanced Open Source Relational Database and stands out as a robust and feature-rich solution, offering extensive capabilities for user management. Effective user management is important for ensuring data security, integrity, and accessibility within the platform. …

Read More
Brahmini Ratnam - 31/01/2024

Open-source Data Engineering with PostgreSQL

Blog-4: Apache Drill Magic across PostgreSQL, Local Parquet, and S3 INTRODUCTION: Welcome back! Following our exploration of data movement between PostgreSQL and Amazon S3 in the previous blog, we now venture into the realm of querying with Apache Drill. In …

Read More
Brahmini Ratnam - 24/01/2024

Open-source Data Engineering with PostgreSQL

Blog-3: Data Loading with Apache Spark INTRODUCTION: Welcome to the next installment of our series on Open-source Data Engineering with PostgreSQL. In this blog, we’ll delve into the practicalities of transforming table data from PostgreSQL into the Parquet format and …

Read More
Brahmini Ratnam - 17/01/2024

Open-source Data Engineering with PostgreSQL

Blog-2: Installation and Setup on Ubuntu INTRODUCTION: Welcome back to the series on Open-source Data Engineering with PostgreSQL. In this post, we shall delve into the installation and configuration of Apache Spark and Apache Drill on an Ubuntu environment. Our …

Read More
Brahmini Ratnam - 10/01/2024

Open-source Data Engineering with PostgreSQL

Overview – A Curtain raiser Introduction: In the ever-evolving landscape of Data management, organizations are constantly seeking efficient ways to handle, transform, and query massive datasets. Data Archiving has become an important component of Data Engineering in the ever-evolving landscape …

Read More
Venkat Akhil - 03/01/2024

Mastering Timestamp-Based CDC Hurdles: Solution Implementation

Introduction In the execution phase of mastering Timestamp-Based Change Data Capture (CDC) hurdles, the focus lies on implementing INSERT, DELETE, and UPDATE operations in our PostgreSQL database and using the proven solution we discussed in the previous blog to achieve …

Read More
Venkat Akhil - 27/12/2023

Mastering Timestamp-Based CDC Hurdles: A Proven Solution

Introduction Have you experimented with Timestamp-Based Change Data Capture using the Pentaho Data Integration (PDI) tool? Achieving data replication from a source database to a target database through “Timestamp-Based Change Data Capture” with Pentaho Data Integration is indeed straightforward. Perhaps …

Read More
Venkat Akhil - 20/12/2023

Timestamp-based Change Data Capture

Introduction Hey all!! Hope you are following the series of topics around Data Integration with PostgreSQL, The World’s Most Advanced Open Source Relational Database. In our previous blog, we explored the Change Data Capture (CDC) and its methods. If you …

Read More
Venkat Akhil - 06/12/2023

Pentaho Data Integration with PostgreSQL

Introduction Pentaho Data Integration (PDI) serves as a robust ETL (Extract, Transform, Load) tool, playing a pivotal role in handling the complexities of data ingestion pipelines. As organizations accumulate vast amounts of data from diverse sources and in different formats, …

Read More
Venkat Akhil - 29/11/2023

PostgreSQL Data Collector

Introduction Embarking on the journey of efficient database management often requires reliable tools, and in the realm of PostgreSQL, a powerful ally comes in the form of a Python-based utility – the PostgreSQL Data Collector. Developed to streamline data collection …

Read More
Taraka Vuyyuru - 22/11/2023

Mastering PostgreSQL: Rollback to Savepoints !!

Introduction: The word “Transaction” will ring so many bells and whistles, but in this topic’s context a “Transaction” is a call to a database function or procedure that can have one or multiple DML operations like Insert, Update, or Delete. …

Read More
Brahmini Ratnam - 15/11/2023

Building an Efficient Data Pipeline with PostgreSQL and Talend Open Studio

Introduction: In the rapidly evolving landscape of data management, creating a robust data pipeline is essential for organizations to derive meaningful insights and drive informed decision-making. In this blog, we’ll explore the integration of PostgreSQL, a powerful open-source relational database, …

Read More
Venkat Akhil - 01/11/2023

Logical replication from Standbys

Introduction: One of the most useful features in PostgreSQL 16 is the ability to perform logical replication from physical replication standbys. This feature allows users to stream data to other PostgreSQL instances, giving developers new options for workload distribution. Additionally, …

Read More