Archive for December 2012
Avoiding Failures in the Cloud: Part 4: Understand your cloud vendor’s monitoring capabilities and tools

(Note: This is the fourth part of a four-part blog post. For Part 1, please click here. For Part 2, please click here. For Part 3, please click here.)

By Paul Moxon, Senior Director, Product & Solutions Marketing

Understand your cloud vendor’s monitoring capabilities and tools

Whether their applications are housed on-premise or in the cloud, every organization needs to monitor how well their applications are running and recognize — ideally before the application’s users — when something has gone wrong.

Fortunately, most cloud-application hosts offer all the monitoring tools you’ve come to expect from your on-premise data center, allowing you to monitor your application instances using predefined and user-defined alerts.

For example, you could have an alert trigger if a threshold is breached for a specified period of time. The alert could be a simple email or text message to a system administrator, but it could also trigger an action (e.g., start up new application instances) or run a script (e.g., remap an elastic IP address to another application instance) to automatically rectify the problem.

Again, these tools are almost certainly available from most cloud-application hosts. It’s critical that you make sure your IT department understands how to interpret and act on the information the tools provide.

In summary, it’s important to keep top-of-mind that cloud applications are just as apt to fail as on-premise applications. However, most cloud-application hosts deliver features that enable you to design and build secure, scalable, and resilient applications that will meet your organization’s availability needs. By integrating the above four recommendations into their strategy, your IT department can minimize the impact of outages on your organization – instead of learning that lesson the hard way – and maximize the potential of its cloud-based offerings.

The tools and safeguards are available. You just have to make sure you use them.

Avoiding Failures in the Cloud: Part 3: Plan for disaster recovery

(Note: This is the third part of a four-part blog post. For Part 1, please click here. For Part 2, please click here.)

By Paul Moxon, Senior Director, Product & Solutions Marketing

Plan for disaster recovery

Every facet of a disaster recovery plan (e.g., application synchronization, failure detection, and remote startup) for an on-premise application is your sole responsibility. It’s a plan that relies on a replicated application — synchronized with the production application — standing by at a remote data center and ready to start up when the original application fails or becomes unavailable.

The disaster recovery plan for your cloud application is different, however. There are three important strategies to consider in that scenario.

First, your disaster recovery plan should rely on what Amazon Web Services calls availability zones; that is:

…distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location. Regions consist of one or more Availability Zones, are geographically dispersed, and will be in separate geographic areas or countries.

This kind of distributed resilience makes availability zones the ideal infrastructure on which to build a disaster recovery solution, one that would survive a complete failure of a cloud-application host’s data center and be able to replicate hosted applications in another availability zone. Whether your host calls them “availability zones” or something else, be sure they’re in place.

Second, by using elastic IP addresses in tandem with availability zones, the fixed address through which users connect to your application can be programmatically reassigned to a different target instance should the original instance fail. The new target instance can even reside in a different availability zone, enabling failover to a new zone in the event of a complete outage.

Finally, incremental backups – a simple method for replicating and synchronizing data across different availability zones – are an absolute must for a disaster-recovery solution. If your data changes constantly and you need your replicated data to be as up-to-date as possible, you’ll need to request frequent snapshots from your cloud vendor. If your data is relatively static, however, or if you can afford a fail-over situation that uses semi-stale data (i.e., thirty to sixty minutes old), snapshots can be less frequent, but should still happen in a way that makes sense for your organization.

(For Part 4, please click here.)