Database Replication
here are several uses for replication. For Example, disaster recovery or for reporting (analysis).When used for Disaster Recovery, replication is generally setup between geographically dispersed data-centers. For example, you are hosting your servers with a hosting company that has multiple data centers located around the country. You can setup replication service which will allow your database to be replicated in one more data centers. If the database in your primary data center is inaccessible, traffic could be re-routed to the secondary data center.
Apart from the disaster recovery, another usage I have seen is that the data is periodically replicated from the primary server to the secondary server for running reports or heavy analysis. This way, you can run your reports, setup data cubes etc. on a secondary server while the primary server is mainly for online transaction processing.
Key difference between Transaction Log shipping, Mirroring and Replication is that the secondary databases in both log shipping and mirroring are in standby mode and are mostly unusable but in replication both primary and secondary databases are active.
Replication can be configured in several ways such as...
- Snapshot - Data is simply copied from the main server to the one or more secondary servers, sometime even to a secondary database on the same server.
- Merging Replication - Data from two or more databases is merged into a single database
- Transactional Replication - Full initial copies of the databases are configured on one or more secondary servers and then periodic updates are applied.
Distributed database management system ensures that any inserts/updates/deletes that are performed on the data in one location are automatically synched with other locations.
There are two types of replication - Synchronous and Asynchronous. In Synchronous mode, the data is replicated on both primary and secondary server(s) at the same time, while in asynchronous replication, there is a delay before data is written to the secondary server. While asynchronous is a better option for performance reasons, a data loss could occur in case of a disaster.
Replication generally occurs either in the storage array (array of disks where the data is stored), at the server or in the network. Most SAN vendors provide a replication software that you can use to setup replication at the storage array level. There are also software solutions to setup replication at the server level, when you don't have a storage array. This is usually a cheaper solution but could impact server performance. Network replication usually is hardware based using an appliance.
Below are some of the Key differences between Log Shipping / Replication
- Latency - A minute or more in Log Shipping, just few seconds in Replication
- Schema altered at the primary server (also known as publisher)?
- Log Shipping - No
- Replication
- Snapshot - No
- Transactional - No
- Merge / Updates- Yes
- A GUID column is added if one doesn't exist.
- Schema altered at the secondary server (also known as subscriber)?
- Log Shipping - No
- Replication - Depends - see below
- Snapshot and Transactional Replication may make schema changes. Standard Snapshot and Transactional replication will not transfer identity attributes. Instead they become normal numeric columns on the secondary server.
- Are there any requirements at the Database Schema level?
- Log Shipping - No
- Replication - Primary Keys are required on the tables involved in replication.
- Can you Select individual Articles?
- Log Shipping - No
- Replication - Yes
- Is the Secondary server usable
- Log Shipping - Mostly No - Requires exclusive lock to apply the transaction log.
- Replication - Yes
Hopefully this article will provide you enough insight into replication
and how to get started if you are thinking about setting one.
Thank you and as always, your comments are welcome.