Instrumenting production databases using triggers

In this short series of database replication techniques, we already covered polling for changes.

As mentioned, polling is very inefficient, which leads us to another technique...

Sprinkling triggers all over your production database

Let us create triggers to intercept all data modification statements on tables. For each table we are interested in, we create a separate trigger for insert, update and delete. These triggers write a full log of what is changed into a so-called "shadow table".

Unfortunately, this means we get the following side-effects:

  • We slow down writes to production database because each write has to additionally be written into the shadow table
  • We need to create additional shadow table for each tracked table
  • If source table definition changes, we need to maintain compatible structure in the shadow table, or use a less-efficient unstructured content (like JSON) in it
  • We need to properly clean up all this baggage if we want to stop replicating the data

Benefits of this approach

We still need to poll for changes to replicate, but now we have far more optimal data structure for polling. Instead of examining all tables and all of the data, we have to examine only the changes logged since the last run. That means several orders of magnitude less work.

Warning

If our replication process stops for any reason, production database shadow tables will continue to grow.

While slowing down the production database and introducing possible points of failure in form of hundreds or thousands of triggers make me uneasy, this approach works well.