Customer Success

Timestamp-based Change Data Capture

Introduction

Hey all!! Hope you are following the series of topics around Data Integration with PostgreSQL, The World’s Most Advanced Open Source Relational Database. In our previous blog, we explored the Change Data Capture (CDC) and its methods. If you missed it, we recommend reading the blog.

Timestamp-based CDC (Change Data Capture) is one approach to capturing and tracking changes in a database by associating a timestamp with each change. It is a versatile and widely used approach in tracking the changes in databases and supports various use cases like data replication and synchronization. In this blog, we will explore the prerequisites to execute the timestamp-based CDC and the setup with the result set.

Setup the Environment

Before diving into CDC, let’s ensure we have a timestamp column in the table (source) and the necessary components installed and configured.

PostgreSQL Database

  • Install PostgreSQL on the server or your local machine.
  • Create a database and the required tables that will be part of the CDC.

Pentaho Data Integration (ETL tool)

  • Install PDI on the server or your local machine – you may refer to this blog for a step-by-step procedure
  • Connect with the databases.
  • Create the data pipeline for CDC.

Designing the CDC Pipeline

In this CDC pipeline, we will create a new transformation and building a pipeline to extract the data from one table(t1) in one PostgreSQL database and load it in another table(t2) in another PostgreSQL database in incremental basis.

Define the date range

  • Go to Design → Input → Get system info
  • Here, we are getting the start and end range of the transformation.
  • Hop the result to the next step as input.

Export data from Source

  • Go to Design → Input → Table input
  • Here, we will extract the data in the table(t1) using a timestamp column by giving SQL statements.
  • Hop the result to the next step as input.

Import data into Target

  • Go to Design → Output → Insert/update
  • Here, we would insert the data(output from the previous step) into the target table(t2) as per the given conditions.

Input and Output

  • Assuming that both tables(t1 & t2) in different databases with columns (id, name & last_update) have no data.
  • Insert the data in the table(t1) using the insert command.
  • Check the data in table(t2) before running the transformation.
  • Run the transformation. If the transformation is successful, you will get the ✅ at the top of the following steps.
  • Now, check the target table(t2).
  • Compare the data in both the tables(t1&t2).

Note: As discussed in the previous blog, CDC using timestamp will not work for the DELETE option.

Conclusion

In conclusion, the adoption of timestamp-based Change Data Capture (CDC) in PostgreSQL marks a pivotal stride in the realm of data integration. By leveraging this sophisticated approach, you can build systems to capture and propagate real-time changes efficiently. The journey through the nuances of timestamp-based CDC in PostgreSQL+Pentaho DI showcases not just a technical evolution but a strategic leap towards a more responsive and interconnected data ecosystem. As we witness the seamless synchronization of data across diverse platforms and applications, the potential for data-driven decision-making becomes boundless. In the ever-evolving landscape of data integration, embracing timestamp-based CDC in PostgreSQL is not just a best practice—it’s a transformative key to unlocking the true power of real-time data.

Stay tuned for more in Data Integration with PostgreSQL blog post series!!

Unleashing the Power of Change Data Capture

Introduction

Here I am again! Talking about the series of topics around Data Integration with PostgreSQL, The World’s Most Advanced Open Source Relational Database. If you haven’t looked at the previous blog in the series, I’d highly recommend reading the same, it is available here.

In the ever-evolving landscape of data management, organisations are constantly seeking ways to stay ahead of the modifications, additions and deletions in real-time. CDC is a transformative technology that allows organisations to achieve data integrity and consistency across all systems and deployment environments. Additionally, it allows organisations to use the right tool for the right job by moving data from legacy databases to purpose-built data platforms, such as document or search databases and data warehouses.
Change Data Capture (CDC) has emerged as a game-changer, providing a dynamic solution to keep pace with the rapidly changing world of data. This blog explores three W’s of CDC and methods of CDC.

Three W’s of Change Data Capture

What is Change Data Capture?

Change Data Capture (CDC) is a technique used to capture the changes made to data in a source database. So that the changes can be implemented in the target database also. Replication is one of the important aspects where the data is constantly changing, CDC reduces the load by sending only incremental data.

Why Change Data Capture is important?

CDC enables real-time synchronisation of data between source and target systems. This is critical for applications where access to the most current information is essential, such as in financial transactions or healthcare etc

Unlike traditional methods that involve replicating entire datasets, CDC focuses only on the changes. This reduces the processing overhead, minimises network traffic, and optimises resource utilisation.

When to use Change Data Capture?

CDC is particularly used when dealing with datasets that undergo frequent changes. In scenarios where data is constantly updated, capturing only the changes helps maintain system efficiency. 

In environments where data warehouses and analytical systems need to stay current, implementing CDC ensures that updates are propagated swiftly and without the need for extensive data transfers.

Methods of Change Data Capture

Change Data Capture (CDC) comes in various types, each having its own specific use cases and requirements. The primary methods of CDC ,we want to discuss :

Timestamp based CDC

In timestamp based CDC ,we use the timestamp column with the name “last_updated”, “ last_modified” or “date_modified” (as per your convenience) in the source table to identify changes.

Pros: 

  • Simple to implement, especially when timestamps are readily available.
  • No additional load on the source database compared to trigger-based CDC.

Cons:

  • Timestamp columns need to be updated every time.
  • Delete operations will not be identified as there is no timestamp column for deleted rows.

Log based CDC

In log based CDC, changes are captured directly from the transaction logs of the source database. This method is efficient because it reads the logs generated by the database management system (DBMS) to identify changes.

Pros:

  • More reliable and efficient.
  • There is no need to change the schemas of the database system or add additional columns.

Cons:

  • More complex to implement.
  • Dependency on database-specific features.

Trigger based CDC

In trigger based CDC, changes in the source are identified using the database triggers. Triggers are special stored procedures that are automatically executed when a specified event (such as an update, insert, or delete) occurs on a particular table.

Pros:

  • Simple to implement.
  • Directly tied to database events, making it easy to understand.

Cons:

  • Can introduce additional overhead on the source database.
  • May not be as real-time as log-based approaches.
  • Complexity increases with the number of triggers.

Conclusion

As we conclude the exploration of Change Data Capture ,uses and methods of CDC. Change Data Capture is not just a technological tool, it is a strategic asset for organisations navigating the complexities of a data-driven world. By embracing this approach, businesses can not only adapt to change but thrive in an environment where accurate, timely, and reliable data is the cornerstone of success.

If you are interested in exploring CDC in your PostgreSQL database using Pentaho Data Integration (PDI) , please reach out to us, let’s connect and collaborate and extract more from your PostgreSQL database.

Thank you and Stay tuned!!

Pentaho Data Integration with PostgreSQL

Introduction

Installation : 

  •     Pentaho Data Integration Community 
  •     Edition Ubuntu 16 or above  
  •    JDK 11 and above (Java Development Kit)

Step-1: Downloading the Pentaho Data Integration (PDI/Kettle) Software

Step-2: Extracting the zip file

Step-3: Checking Java Availability

Step-4: Launching Spoon

How to connect with Postgresql Database:

  • Before connecting to spoon we need to authenticate ip address of the pentaho server in pg_hba.conf  file /etc/postgresql/*/main/pg_ident.conf
  • we need to change the isten address from ‘local host’ to ‘*’in postgresql.conf file /etc/postgresql/*/main/postgresql.conf. Locate the line: #listen_addresses = ‘localhost’ and change it to *
  • Restart the Postgresql server 
  • Open Spoon and create a new transformation.


  • Click on OK to close the test window.
  • Click on OK again to close the database definition window. A new database connection is added to the tree.
  • Right-click on the database connection and click on Share. The connection is available in all transformations you create from now onwards.

Next steps : 

From Locking to Lockstar: Unveiling the enhancements in PostgreSQL 11 and 14

Understanding Locking in PostgreSQL

Locking is like the security guard of databases, ensuring data integrity and preventing conflicts when multiple users access the database at the same time. In PostgreSQL, “AccessShareLock” locks were acquired on partitions and indexes before PostgreSQL 12. These locks kept the table structure consistent during concurrent operations, preventing unwanted changes.

Introducing Fastpath Locks

Enter “fastpath locks” – lightweight locks designed for efficiency in low-contention scenarios. These locks are managed directly by individual database connections, optimized for speed and performance. But there’s a catch – they’re capped at 16 locks per connection. When that limit is crossed, the central lock manager takes over, providing more comprehensive control.

The real magic lies in how fastpath locks streamline common locking situations, offering impressive performance benefits. But remember, their behavior can change depending on transaction duration. To fully understand their impact, tailored benchmark tests are essential.

To begin, let’s examine the structure of the pgbench_accounts table. This table is partitioned based on the aid column using the RANGE strategy, distributing rows across various child tables like pgbench_accounts_1, pgbench_accounts_2, and so forth. Each child table contains a range of aid values.

-- Table structure
CREATE TABLE public.pgbench_accounts (
aid int4 NOT NULL,
bid int4 NULL,
abalance int4 NULL,
filler bpchar(84) NULL,
CONSTRAINT pgbench_accounts_part_pkey PRIMARY KEY (aid)
)
PARTITION BY RANGE (aid);
CREATE TABLE pgbench_accounts_1 PARTITION OF pgbench_accounts FOR VALUES FROM (100001) TO (110001);
CREATE TABLE pgbench_accounts_2 PARTITION OF pgbench_accounts FOR VALUES FROM (110001) TO (120001);
CREATE TABLE pgbench_accounts_3 PARTITION OF pgbench_accounts FOR VALUES FROM (120001) TO (130001);
CREATE TABLE pgbench_accounts_4 PARTITION OF pgbench_accounts FOR VALUES FROM (130001) TO (140001);
CREATE TABLE pgbench_accounts_5 PARTITION OF pgbench_accounts FOR VALUES FROM (140001) TO (150001);
CREATE TABLE pgbench_accounts_6 PARTITION OF pgbench_accounts FOR VALUES FROM (150001) TO (160001);
CREATE TABLE pgbench_accounts_7 PARTITION OF pgbench_accounts FOR VALUES FROM (160001) TO (170001);
CREATE TABLE pgbench_accounts_8 PARTITION OF pgbench_accounts FOR VALUES FROM (170001) TO (180001);
CREATE TABLE pgbench_accounts_9 PARTITION OF pgbench_accounts FOR VALUES FROM (180001) TO (190001);
CREATE TABLE pgbench_accounts_10 PARTITION OF pgbench_accounts FOR VALUES FROM (190001) TO (200001);
CREATE TABLE pgbench_accounts_11 PARTITION OF pgbench_accounts FOR VALUES FROM (200001) TO (210001);
CREATE TABLE pgbench_accounts_12 PARTITION OF pgbench_accounts FOR VALUES FROM (210001) TO (220001);
CREATE TABLE pgbench_accounts_13 PARTITION OF pgbench_accounts FOR VALUES FROM (220001) TO (230001);
CREATE TABLE pgbench_accounts_14 PARTITION OF pgbench_accounts FOR VALUES FROM (230001) TO (240001);
CREATE TABLE pgbench_accounts_15 PARTITION OF pgbench_accounts FOR VALUES FROM (240001) TO (250001);
CREATE TABLE pgbench_accounts_16 PARTITION OF pgbench_accounts FOR VALUES FROM (250001) TO (260001);
CREATE TABLE pgbench_accounts_17 PARTITION OF pgbench_accounts FOR VALUES FROM (260001) TO (270001);
CREATE TABLE pgbench_accounts_18 PARTITION OF pgbench_accounts FOR VALUES FROM (270001) TO (280001);
CREATE TABLE pgbench_accounts_19 PARTITION OF pgbench_accounts FOR VALUES FROM (280001) TO (290001);
CREATE TABLE pgbench_accounts_20 PARTITION OF pgbench_accounts FOR VALUES FROM (290001) TO (300001);

Investigating Locking Behavior Changes

Let’s dive into the main focus of this post: the change in locking behavior from PostgreSQL 11 to PostgreSQL 14. We’ll compare the locking behavior when querying for the aid value of 100002 across all partitions in both versions.

PostgreSQL 11: Locking Behavior

In PostgreSQL 11, when querying for aid = 100002, the following locks are acquired:

postgres@ubuntu:~$ psql
psql (11.20 (Ubuntu 11.20-1.pgdg20.04+1))
Type "help" for help.

part_test=# begin;
BEGIN
part_test=*# select * from pgbench_accounts where aid = 100002;
    aid    |    bid    | abalance  |                                        filler
-----------+-----------+-----------+--------------------------------------------------------------------------------------
 100002 | 100002 | 100002 | filler
(1 row)

part_test=# select count(*) from (select nspname, relname,l.locktype, l.pid,l.mode, l.granted, l.fastpath from pg_locks l join pg_class c on (relation = c.oid) join pg_namespace nsp on (c.relnamespace = nsp.oid)
where nspname not in ('pg_catalog','information_schema') and  pid in (select pid  from pg_stat_activity
 where datname = current_database()
 and query != current_query())
  order by l.fastpath
) as lock_test;
 count
-------
    41
(1 row)
PostgreSQL 14: Locking Behavior

After upgrading to PostgreSQL 14, querying for the same aid = 100002 reveals a distinct locking behavior:

postgres@ubuntu:~$ psql -d part_test
psql (14.8 (Ubuntu 14.8-1.pgdg20.04+1))
Type "help" for help.

part_test=# begin;
BEGIN
part_test=*# select * from pgbench_accounts where aid = 100002;
    aid    |    bid    | abalance  |                                        filler
-----------+-----------+-----------+--------------------------------------------------------------------------------------
 100002 | 100002 | 100002 | filler
(1 row)

part_test=# select nspname, relname,l.locktype, l.pid,l.mode, l.granted, l.fastpath from pg_locks l
    join pg_class c on (relation = c.oid)
    join pg_namespace nsp on (c.relnamespace = nsp.oid)
where nspname not in ('pg_catalog','information_schema') and  pid in (select pid
              from pg_stat_activity
              where datname = current_database()
                and query != current_query())
order by l.fastpath;
 nspname |         relname         | locktype |  pid  |      mode       | granted | fastpath
---------+-------------------------+----------+-------+-----------------+---------+----------
 public  | pgbench_accounts_1_pkey | relation | 14485 | AccessShareLock | t       | t
 public  | pgbench_accounts_1      | relation | 14485 | AccessShareLock | t       | t
 public  | pgbench_accounts        | relation | 14485 | AccessShareLock | t       | t
(3 rows)

Workload Variability: Locking behavior can differ based on individual workload characteristics. Thoroughly test the impact of these changes on your application’s performance.

Informed Decision-Making: Prior to upgrading PostgreSQL, assess the locking behavior changes, and ensure that your application’s performance won’t be negatively affected.

Comparing All Partitions: PostgreSQL 11 vs. PostgreSQL 14
Here’s a comprehensive comparison of locking behavior across all partitions for the pgbench_accounts table between PostgreSQL 11 and PostgreSQL 14.

(Note: Due to space constraints, we’re showcasing a subset of partitions.)

PartitionPostgreSQL 11 Locks AcquiredPostgreSQL 14 Locks Acquired
pgbench_accounts_1LocksFewer Locks
pgbench_accounts_2LocksNo Lock
pgbench_accounts_14LocksNo Lock
pgbench_accounts_15LocksNo Lock

The AccessShareLock in PostgreSQL is a double-edged sword, offering concurrency while posing potential hurdles. Let’s look into its disadvantages:

Contention at the Table Level: While AccessShareLock promotes read concurrency, it locks entire tables. This can lead to contention when concurrent transactions need different levels of access, stalling progress.

Blocking Writes: Though AccessShareLock supports reads, write-oriented transactions requiring exclusive locks can get blocked by those holding AccessShareLocks, hampering data modification.

Deadlock Danger: Poorly coordinated lock acquisitions can trigger deadlocks. When transactions chase incompatible locks, they could deadlock, stalling the system.

Performance Overhead: Locks incur overhead. Frequent lock acquisitions and releases could impact performance, particularly in highly concurrent scenarios.

Read-Only Overkill: For read-only tasks, AccessShareLock might be overkill. Alternative strategies, like READ COMMITTED isolation or snapshot isolation, could offer read consistency without excessive locking.

In conclusion, while AccessShareLock has merits, its drawbacks warrant careful consideration. Tailor your lock strategy to your application’s needs, optimizing concurrency without sacrificing performance.

Summary:

The understanding that PostgreSQL versions prior to 12 acquire locks on all partitions and indexes may not be universally applicable. It is essential to conduct specific testing to confirm the impact of such locks. In our own testing, we observed no discernible impact before and after upgrading to PostgreSQL version 13. Therefore, it is crucial to evaluate the locking behavior based on individual workload characteristics and conduct thorough testing to ascertain the actual impact.




Six Strategies for Improving Customer Experience (CX) & Customer Success (CS)

Why is Customer Success your golden ticket to business success?

In the last post, I shared my interest and learning experience on PostgreSQL. In this blog, I will share more details on Why Customer Success is crucial for any business.

Customer Experience: Customer Experience (aka CX) is often defined as the sum total of all interactions between a business and its customers, from pre-purchase activities such as researching a product or service, up through post-sale customer service. Every touchpoint in this process – from start to finish – should be tailored to create a positive experience for the customer. This includes both digital interactions, such as websites and mobile apps, as well as physical experiences like store visits or contact with sales representatives. Companies should take into account not just what products they offer, but also how their customers interact on any platform or device.

Businesses that take CX seriously are able to increase revenue, as an improved experience will encourage more people to become loyal customers who remain engaged for longer periods of time. Additionally, businesses that prioritize CX are less likely to be impacted by competitors due to their strong and loyal customer base.

Customer Experience is an increasingly important element in the success of businesses today. It has become essential for companies to understand and define what customer experience means to them, so that they can provide the best possible service or product for their customers.

Customer Success: Customer Success (aka CS) is a strategy that focuses on ensuring customer satisfaction with one’s product or service. The goal of customer success initiatives is to create relationships with customers that foster loyalty, increase revenue, and reduce churn, while helping to ensure that customers get the most out of products and services.

Customer Success teams employ a variety of tactics in order to achieve their goals. They must be able to identify customer needs, build long-term relationships with clients, provide personalized support, utilize data analytics for optimization purposes, implement effective onboarding strategies for new customers, and track customer feedback closely in order to ensure continuous improvement. Ultimately all these efforts are made with the aim of creating an environment where customers are highly satisfied so they can make the most out of their investment in the product or service they bought.

Measuring Customer Experience & Customer Success

Companies often use different methods to measure Customer Experience and Customer Success. While some rely on benchmarks such as Net Promoter Score (NPS), some of them still use the traditional Customer Satisfaction Scores (C-Sat), while some others track these using  key performance indicators (KPIs), and service level agreements (SLAs).

While it ultimately boils down to the company’s business goals, and the nature of the products or services being offered, and there is no hard-and-fast rule on how these should be measured – it is important that you regularly capture customer feedback in some way or form, analyze the data collected, and use the insights to improve your product or service offerings. Doing so will help you increase Customer Retention Rate, Repeat Purchase Rate, Upsell and Cross Sell Rate, Monthly Recurring Revenue (MRR) and Customer Lifetime Value (CLV). For Example OpenSource DB being a service-oriented company, oversees the day-to-day Customer Experience and Customer Success by focusing on customer retention rate and proactively working to deliver valuable products and services.

Challenges in Delivering Exceptional CX & CS

In today’s customer-centric landscape, delivering an exceptional Customer Experience (CX) is becoming increasingly important for businesses. With the introduction of digital technologies, companies are striving to provide personalized experiences that keep customers engaged and coming back for more. However, this isn’t without its challenges. One major challenge to delivering CX is analytics and data management capabilities. Companies need the ability to collect customer data from all sources and analyze it quickly in order to make real-time decisions on how best to serve their customers. Without the right tools, it can be difficult or even impossible to understand how customers are engaging with your product or service.

Need help in analyzing or organizing your data? – Contact Us

Secondly, identifying the resources needed for a strong customer success program may not be easy in practice. It takes manpower, technology, data analysis and other elements in order to create effective strategies for helping customers reach their goals. Additionally, companies need to consider how much funding should be allocated for these initiatives, as well as how best to track progress and measure results over time.  Another challenge lies in managing customer expectations; businesses must be able to provide relevant support throughout the customer journey while still maintaining profitability.

Six Strategies for Improving CX & CS

As customers demand more personalized experiences, businesses must focus on delivering exceptional CX to remain competitive and stand out from their competition. They should also have an effective CS strategy in place so as to enable the customers to make the best use of the products and services, and increase customer loyalty.

Here are six strategies that can be used to improve the overall customer experience: 

1. Get to know your customers – Understand who your customers are and what they need from you by collecting feedback and analyzing customer behavior data. Use this information to tailor your services and products accordingly, as well as create content that will resonate with them.

2. Create an onboarding process for new customers. This onboarding process should include providing detailed instructions on how to use the product or service as well as offering technical support and answering any questions they may have about it. 

3. Be available when needed – Customers want answers fast, so make sure you’re always available on the channels they use the most such as social media or email support.

4. Maintain strong communication with customers throughout their journey – This involves delivering regular updates about product changes, sending helpful resources such as blog posts or videos, and engaging with customers through social media platforms like Facebook and Twitter.

5. Keep track of any complaints or compliments that are received from customers.

6. Use analytics tools such as Google Analytics to track website interactions and user behavior, which can help businesses identify opportunities for growth and improvement.

Key Responsibilities of Customer Success Manager (CSM) at OpenSource DB

1.   Manages a portfolio of customers across various industry sectors and domains.

2.   Educate customers about best practices, industry developments, and how to effectively utilize Postgres DBMS to manage their business data.

3.   Understand both business challenges and technical requirements.

4.   Identifying opportunities for revenue expansion through upselling and cross-selling.

Scale and optimise your business with OpenSource DB. Contact us today to know more about our Postgres DBA services in detail. Schedule a demo with our team to see exactly how we can help you with your data management objectives.

For Business Enquiries Contact
Raviteja Reddy K
Customer Success Manager – OpenSource DB
Hyderabad, India | +91-8886 818 173
| https://opensource-db.com/
https://www.linkedin.com/in/ravi-tej-reddy/