Zerowait Disaster Recover / Prevention

How We Solved our Disaster Recovery Problem Using Standard Tools and a Bit of Creativity

Written by: Chris Mire, Engineering Manager

At Zerowait, we have been wrestling with our Disaster Recovery implementation for many years. We had looked at the cost of particular solutions, versus ease of deployment, versus ease of implementation of those solutions. We also had some challenges establishing a measurement of what would constitute a successful implementation.One item that had to be resolved at the onset was defining some terminology and their implications. Disaster recovery is when our primary facility is offline for 2 days or more, or if there is a physical disaster that renders the primary facility inoperable. Business continuance refers to a recovery from a shorter interruption in service that is less than our disaster recovery threshold, and does not involve us failing over to our secondary facility.

We typically looked at DR solutions that had little or no data loss between an event and the data that was available for use after the event.

Our DR facility is about 1,300 miles away (as the crow flies) from our Production facility, and we have a modest 5Mb link connecting our DR facility, and we use site-to-site VPNs setup between our routers to keep our data safe.

A combination of physical, as well as VMware are used as our application servers. Zerowait SimplStor, as well as NetApp is what we use for our storage needs. NetApp SnapMirror is used to replicate systems as well as data.

Our DR solution is tested yearly, and we just recently completed another successful DR test.

Now… before we talk about our implementation, we need to discuss some changes we have made in our infrastructure, which ultimately enabled our successful test.

The biggest change made was in our expectations, specifically what we consider an acceptable amount of data loss, and the transition of a few applications to be hosted offsite.

For years we ran all of our infrastructure in-house, which provided us the ability to strictly control and protect our data. We were replicating a ton of data to our DR facility, over that modest link, which caused some challenges replicating the data back to our primary facility. VMware’s SRM was implemented at one point as our DR solution. It proved costly, overly complex, and ultimately did not function as desired.

We evaluated some of our mission-critical applications with available hosted services, and chose to migrate our webhosting, email, and transitioned to IP phones. Moving those applications made a huge impact almost immediately. We were careful to fully vet the security and stability our chosen vendors, and now have a resilient, cost-effective services for our distributed work force.

Offloading these services from our internal infrastructure provided us the opportunity to re-evaluate our DR strategy, and ultimately redesign our implementation. We focused on 2 internal systems that were deemed mission critical for day to day operations in the event of a disaster.

Daily backups are made of our databases, which are stored on a NetApp Filer. Those files are SnapMirrored to our DR facility.

Citrix is used as part of our standard infrastructure, which provides our remote users with access to internal databases, as well as a few other applications.

We discovered that instead of using expensive tools and complex failover processes, that we could simply load the replicated database files from our main site to a standalone database server at our DR site.

Access to the DR site is facilitated by a simple and effective method of Citrix license swapping.

Having a replicated Citrix implementation is expensive, and can be complex to configure. Since Citrix licenses are bound to servers, we built an identical Citrix server at our DR facility and simply reallocated the licenses from our primary to our DR location. Once the initial license is downloaded to the DR Citrix server, the license can be toggled back and forth from the MyCitrix administration portal.

The result is a Disaster Recovery solution and plan that is effective, cost efficient, and simple to maintain.

Realistic bespoke solutions, paired with Zerowait’s outstanding service are a couple of the reasons our clients come back to us year after year. We do not outsource our call center, so you will ALWAYS get a true Zerowait employee that is located in Delaware or Texas.

Do you want to know more?

This entry was posted in Uncategorized. Bookmark the permalink.