Rewriting Disaster Recovery Plans for the Edge

In an era when systems and applications are dispersed throughout the enterprise and the cloud, IT leaders have to rethink their disaster recovery plans.

Mary E. Shacklett

President of Transworld Data

September 12, 2019

Credit: Image: James Thew - stock.adobe.com

Writing a disaster recovery plan has been the responsibility of IT departments for years, but now these plans must be recalibrated to failover for edge and cloud environments. What's new, and how do organizations revise their plans?

Rule 1: IT does not control the edge

Given the adoption of edge computing and other distributed computing strategies, IT can’t control all of this distributed compute with a standard centralized DR plan that is built around the data center. In day to day manufacturing using robotics and automation, for example, it is line supervisors and manufacturing staff who run the robots and are responsible for making sure that these assets are safe and secure in locked areas when they are not in use. In many cases, these manufacturing personnel might also install and monitor/maintain the equipment themselves, or work with vendors.

These personnel do not have IT’s background in security or asset protection and maintenance/monitoring. At the same time, installing new edge networks and solutions outside of IT multiplies the number of IT assets where failures could occur. Somewhere, DR and failover plans need to be documented and trained for so these assets are covered. The most logical place for this to occur is within the IT DR and business continuity plan.

To revise the plan, IT must meet and work with these different distributed computing groups. The key is getting everyone involved and committed to documenting a DR and failover plan that they then participate in and test on a regular basis.

Rule 2: Cloud apps mean cloud DR consignment

In 2018, Rightscale surveyed nearly 1,000 IT professionals and found that the average number of clouds these companies were running on was approaching 4.8.

It would be interesting to see how many of these companies have documented disaster recovery procedures for dealing with cloud outages. This concern crossed my mind when I recently reviewed the cloud vendors that a client was using -- to find that nearly all of the cloud vendors had clauses in their contracts that excused them from liability if a disaster occurred.

The takeaway: If your IT department hasn't already done so, each cloud vendor that you use should be written into your disaster recovery plan. What are the SLAs that the vendor is promising for backup and recovery? If there is a failure, what are your (or your vendor’s) DR plans? Do you have an agreement with your vendor to annually test the apps that you use on the cloud for DR failover?

Rule 3: Physical security is important

The more your IT gravitates to the edge, finding its way into manufacturing plants or field offices, the more physical security becomes entwined with disaster recovery. What if a field office in a remote desert location overheats and a server fails? Or an unauthorized employee enters a cage area in a manufacturing plant and tampers with a robot? Your DR plan should include regular inspections and tests of equipment and facilities at distributed physical locations, not just at your central data center.

Rule 4: DR communications must get better

A number of years ago, when I was CIO in a banking operation, we experienced an earthquake and our IT went offline. There was minimal damage to the data center, but networks and communications throughout the area were disrupted, so tellers in branch offices had to handle customer transactions by keeping manual ledgers that they would then input into the system when system service returned.

During this time, a customer asked a teller what was wrong and she told him, “Our entire computers have been hit.” The information spread like wildfire throughout the community and media, and we had a lot of customers rushing in, trying to close accounts.

This type of situation is exacerbated when you have even more people controlling IT assets such as in edge computing. This is why it’s so important to have a communications “tree” that explains who communicate what and to whom during a disaster, and that everyone adheres to.

Normally, the communications “voice” should be the company’s public relations team. This team coordinates with upper management and issues statements about the disaster to the community and the media.

If this communications channel is not firmly established and entrenched in the minds of your employees, you could find yourself spending more time on disaster recovery from errant communications than on the actual disaster.

Rule 5: DR must be for multiple geographies

With edge computing and remote offices on the rise, it goes without saying that DR can no longer be centralized in one location or data center. Especially if you are using clouds for DR, choose cloud providers that have multiple geo-locations. This enables a failover to a location that is up and running in the event that your main data center, or a cloud data location, goes down. These failover data center scenarios should be included and tested for in your DR plan.

Rule 6: DR testing plans must be recalibrated

If you’re going to consign more IT to the cloud and deploy more edge computing, new DR testing scenarios should be added to your plan to ensure that DR documentation and testing are in place for all of these new locations. You want to know your DR will work for every company DR scenario if you have to enact it.

Rule 7: The C-suite must give more than lip service to DR

The move to cloud and to edge computing has complicated disaster recovery. This means that most organizations need to review and revise their DR plans. DR reviews and revisions take time for a task that already isn’t a priority for most organizations and that tends to lag behind the large list of projects that need to get out.

Because of the changes that cloud and the edge have brought to IT, it is up to the CIO to impress upon management and the board how these changes have affected DR, and of the need to put effort and time into revising the DR plan.

Rule 8: Edge and cloud vendor involvement in DR should be secured

As mentioned earlier, a majority of cloud vendors don't give much assurance for disaster recovery and failover in their contracts. Before you sign on a dotted line with a cloud vendor, vendor disaster recovery commitment and support should be part of your RFP and an important point of discussion.

Rule 9: Network redundancy is paramount

Many organizations focus on recovery of systems and data when disasters strike, but place less emphasis on networks. However, given the role of the Internet and wide area networks today, network DR failover and redundancy should also be built into DR plans.