ReliableDR for VMware vSphere is a product that offers very interesting orchestration, automation and replication capabilities, but most importantly, automation of the whole DR testing and validating process. ReliableDR has been designed especially for VMware vSphere. The product was acquired just recently by PHD Virtual and further enhanced in the 3.1 release which has just come out. Reliable DR provides automated failover and failback capabilities for applications and of course for the underlying VMs.
The product has the ability to verify several different points on the application to ensure that the application is working correctly at the DR site. The dependencies are taken into account as well, for example to successfully test if Exchange server is recoverable, you'll need the domain controller and DNS server to be running at the DR site. ReliableDR 3.1 can do that. If I compare ReliableDR to other DR products like SRM or Zerto? I'd say, a mix of both.
Before I start the in depth article, here are some of the hottest add-ons to the latest 3.1 release of ReliableDR:
- vCloud Director support – enables to failover a tenant into the cloud.
- Always On VMs (dependencies) – “shared components” for testing (DNS, DC…) The same base services are reused at the DR site in order to ensure that the dependency is always UP.
- Web based architecture – remote based config, monitoring, and testing.
- Advanced reporting – enhanced compliance reports
Not only can DR plans be created with few simple steps, but those DR plans can be automatically tested multiple times a day. During those failover tests, the applications running in those VMs can be executed in isolated environments (sandbox), or they can have connection settings to other production networks in order not to interfere with production VMs. These applications are also verified to check that they function (ping, services, functionality) and a certified recovery point is created with a compliance report.
To do this you simply setup a mapping for data-stores, networks and hosts. In case you're using array based replication with two same arrays (one at the principal and the second at the remote site), you'll setup array based mapping too. The programming and the orchestration of the DR plan takes 4 simple steps (in case of hypervizor based replication) and 6 steps when SAN based replication is used
The Architecture of ReliableDR 3.1
All you need is a single VM with Microsoft 2008R2 as an OS, where the product installs. Usually at the remote site. There are no requirements for a second vCenter on the DR site as the product hooks into the infrastructure through a VIX interface and uses VMware tools to interact with guest OS. This is very clever as working through the VIX lets you control all the necessary DR tasks for each VM:
- Registering and unregistering VMs.
- Powering VMs on and off.
- Managing VM snapshots.
- Adding and removing VM shared folders.
- Copying files into and out of the VM guest.
- Starting and stopping processes within the guest.
A usual architectural scenario would be to deploy this product at the DR site (in case that we want to protect a primary site against failure) and remotely configure all the necessary jobs, DR plans, etc, as the solution is fully web based. The underlying SQL database is used to store the not only the information necessary for the configuration, setup, and jobs, failover, failback testing, but also provides detailed reporting capabilities which can be further enhanced with specific reporting solutions like Crystal Report.
Screenshot showing the creation of new job:
As a side note, the design decisions of the remote site, concerning VMware licensing, are very flexible. That being said it is not necessary to have second vCenter at the DR site, a standalone host at the remote site is a possible scenario. ReliableDR isn't using vCenter for orchestrating or running the failover or failback tasks at all. All this is done by the product itself. You could also have sites where you run a mix of vSphere Editions (ex vSphere 4 on first site and vSphere 5 on second). The only thing to check is the virtual hardware version number, because you can't technically run virtual hardware 7 VMs (vSphere 4) on vSphere 5, which brought the VH 8.
Reliable DR 3.1 product features:
- Automated, Continuous, Service-Oriented DR Testing – Applications are tested taking into accounts their dependency (example Exchange Server needs Domain Controller together with DNS server to be online). The tests validate network functionality (ping), application functionality (services) and also content functionality (Web page). The process of automation of verification of those applications can be scheduled as many as several times per day. In addition it's possible to add tests for custom services, for applications which aren't common (for example applications which have been developed internally). On the screenshot below you can see the groups of services…. Similarly, Linux VMs are supported as well. See the custom Service option – all you need to do is to add the name of the Windows service you want to include in the DR recoverability tests.
- Certified Recovery Points – What is it? It's very interesting feature called Certified Recovery Points (CRP). It's basically a snapshot based states of VM (or group of VMs) where the applications (or services) running in those VMs were in fully functional state during the test – hence the name … certified. Those CRPs are archived on personalizable schedule and when failover is invoked you have the possibility not only to choose the latest point in time, but also the latest CRP which was the state where the VM(s) was certified “working”. (ping, services and functionality is tested). Supported not only Windows, but also Linux application.
- 100% automated testing and failover – the tests can be scheduled to be run several times a day, with always the reporting showing if any warnings or problems occurred during the test.
- Compliance Reporting – shows the DR objective compliance. There is several colons at the main report section. Application RTOs is the maximum amount time that the job should run. In case the job which starts up the VM takes longer, then yellow warning will appear. Email or SNMP trap informs the admin about that. One of the reasons could be larger number of VMs configured in the same job. You can see the latest CRPs there as well.
Screenshot of the Dashboard where are the compliance reports. You can click to enlarge:
These recovery compliance reports can be sent by e-mail or can be generated as a pdf or Excel file.
- vCloud Director Integration – ReliableDR supports vCloud Director to protects workflows running under vCD.
- Per VM based Replication – depending of your vSphere storage architecture and possibilities, the usage of hypervisor based snapshots or array based replication can be used.
- Array based replication – In case your SAN has already the SAN based replication feature, so you can setup a replication based on your LUNs, and leverage Reliable DR to orchestrated the DR plan. There is many of the array based replication products already supported and new ones are added all the time.
- Test, Failover, and Failback – Automation of failover and failback processes. There are many configuration possibilities as well, for example to personalize the name of the VMs for the failover (normally it adds extension DEMO to the VM, but it's personalizable). You can failover to production network or not, or configure to failover to a different isolated network. The mapping is fully configurable.
You can also configure the VM to be completely renamed at the DR site, or can keep the original name as well. A reconfiguration of the IP address (“re-iping') is supported as well. You can also change the boot order of VMs which are present in the job.
A screenshot showing the recovery steps on single job:
- Admin Auditing – This allows you to audit who logged in and where. Who does what, etc. All of the details about all the jobs that has run. It's available at the Tasks and Events section (New in 3.1 release), where you will find all the jobs that has run in the past, what was the action performed, start and end time, and importantly, what was the result of the job.
- Roles based features – users can be assigned through Active Directory, can be local users, or mix of both. If you want to have a user to have access only to certain features. Useful for service providers. You can have different users configured on the local SQL DB, which will all be talking to single vCenter. In order to manage different users, you must create (or duplicate) a role, and then create user, which will be assigned to this role.
- Rebranding the login screen and reports – You can upload your own logo. This is a fully customizable interesting feature not only for a single customer, but also for cloud providers who can customize the login screens for each of their tenants and clients, together with the reports.
Principally there are 2 main paid editions, plus there is also a Free edition (with limited functionality). The limited Free edition is basically limited to single point in time.
- Enteprise Edition – application testing, fully featured version with Certified recovery points (CRPs),
- Foundation Edition – heartbeat of VMs only, not full application testing.
The ReliableDR editions screenshot:
The ReliableDR product uses vCenter as a starting point only. It does not install anything on vCenter or in the hypervizor. Instead it uses the VIX interface as an entry point for each VM. Through there it can orchestrate and automate all the DR actions, like creating snapshots, powering on and off the VMs etc.
A single 2008R2 VM that sits at the remote site is necessary for the whole solution to operate. The advanced scripting and orchestration engine is based on SQL Express DB, which powers the whole solution. Once configured, the steps to create failover plan take just few clicks. The powerful application awareness validating testing, together with the certified recovery points (CRP) are very beneficial for the admin, who is the person responsible of application and data integrity. This tool gives him a very powerful tool to automate at low cost the whole DR process.
Reliable DR has good chance for success not only for bigger companies or service providers, but especially for the SMBs. The smaller businesses can leverage the possibility to bring DR capability with advanced validating and testing for internal built applications, because there are always some kind of these apps in this type of environment. The possibility to use the internal based replication for VMs allows locally attached storage to be used at remote sites, to lower costs further as well.
Please note that this review was sponsored by PHD Virtual.
Mark Hall says
Vladan – Nice job on the review. ReliableDR certainly looks like a game-changer in the VMware DR space.