Replication Services in a BI environment

Posted by

In this blog post I will explain the principles of SQL Server Replication Services without too much detail and I will take a look on the BI capabilities that Replication Services could offer in my opinion.

SQL Server Replication Services provides tools to copy and distribute database objects from one database system to another and maintain consistency afterwards. These tools basically copy or synchronize data with little or no transformations, they do not offer capabilities to transform data or apply business rules, like ETL tools do.
The only “transformations” Replication Services offers is to filter records or columns out of your data set. You can achieve this by selecting the desired columns of a table and/or by using WHERE statements like this:
SELECT <published_columns> FROM [Table] WHERE [DateTime] >= getdate() – 60

There are three types of replication:

Transactional Replication

Transactional replication components and data flow
This type replicates data on a transactional level. The Log Reader Agent reads directly on the transaction log of the source database (Publisher) and clones the transactions to the Distribution Database (Distributor), this database acts as a queue for the destination database (Subscriber). Next, the Distribution Agent moves the cloned transactions that are stored in the Distribution Database to the Subscriber.
The Distribution Agent can either run at scheduled intervals or continuously which offers near real-time replication of data!

So for example when a user executes an UPDATE statement on one or multiple records in the publisher database, this transaction (not the data itself) is copied to the distribution database and is then also executed on the subscriber. When the Distribution Agent is set to run continuously this process runs all the time and transactions on the publisher are replicated in small batches (near real-time), when it runs on scheduled intervals it executes larger batches of transactions, but the idea is the same.

Snapshot Replication

Snapshot replication components and data flow
This type of replication makes an initial copy of database objects that need to be replicated, this includes the schemas and the data itself. All types of replication must start with a snapshot of the database objects from the Publisher to initialize the Subscriber. Transactional replication need an initial snapshot of the replicated publisher tables/objects to run its cloned transactions on and maintain consistency.

The Snapshot Agent copies the schemas of the tables that will be replicated to files that will be stored in the Snapshot Folder which is a normal folder on the file system. When all the schemas are ready, the data itself will be copied from the Publisher to the snapshot folder. The snapshot is generated as a set of bulk copy program (BCP) files. Next, the Distribution Agent moves the snapshot to the Subscriber, if necessary it applies schema changes first and copies the data itself afterwards. The application of schema changes to the Subscriber is a nice feature, when you change the schema of the Publisher with, for example, an ALTER TABLE statement, that change is propagated by default to the Subscriber(s).

Merge Replication
Merge replication is typically used in server-to-client environments, for example when subscribers need to receive data, make changes offline, and later synchronize changes with the Publisher and other Subscribers, like with mobile devices that need to synchronize one in a while. Because I don’t really see BI capabilities here, I will not explain this type of replication any further.

Replication Services in a BI environment
Transactional Replication can be very useful in BI environments. In my opinion you never want to see users to run custom (SSRS) reports or PowerPivot solutions directly on your production database, it can slow down the system and can cause deadlocks in the database which can cause errors. Transactional Replication can offer a read-only, near real-time database for reporting purposes with minimal overhead on the source system.

Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards.
In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!

Update:
I got a questing regarding the supported Replication Services features in the different versions of SQL Server (Standard,Enterprise,etc). There is a nice table on MSDN that shows this!

One comment

  1. I’m currently doing this now with Transactional Replication, and have been for a while.  There are some minor issues to grapple with, but for the most part this is a very good workable solution.
    I also keep a ‘dev’ reporting database up to date using the exact same publications.
    One thing to keep your eyes on is the performance of the replication and latency during your peek reporting demands.  If the intention of this replicated database is to be near real time having bad queries lock/block and tie up the database will slow down replication.
    I have tweaked and fiddled and am finally happy with my installation and monitoring of the replication subsystems.  

    Like

Leave a comment