Maintaining operational resilience

What is disaster recovery?

In today's environment, the interconnection between technology and business processes are becoming much more intertwined. Access to information not only moves an organization forward, but it also identifies threats and risks in a proactive manner. This capability is a cornerstone of an effective recovery and preparedness pillar when evaluating risk to an organization or community. I'd challenge anyone to identify a critical service - whether that's in your own workplace, or across any industry that doesn't rely on technology to support their critical services.

Consider your own organization and the dependency of systems and the availability of data related to your day-to-day activities. Whether it's in manufacturing, finance, healthcare, retail/distribution... the need for operational awareness and supporting systems requires a core set of systems and solutions to maintain operational resilience. In most organizations, technology assets are considered critical assets, resulting in the importance of conducting a threat risk assessment as well as a business impact assessment focused on these core components of technology.

Why is it important?

In many organizations, the two labels of business continuity and disaster recovery are used interchangeably, and although there are overlapping elements, they are not one in the same.

Every critical service that relies on technology will be inconvenienced and disrupted to some degree to have to conduct business without that technology. The key driver for BCP is how much of a disruption to your business is tolerable and what you are able and willing to spend to avoid disruption. It is always a balance between the two. If financial resources weren't an issue, every business using technology would probably choose to implement fully redundant, zero-downtime systems. However, the availability of financial resources is an issue.

So, disaster recovery is part of business continuity that deals with the immediate impact of an event. Recovering from a server outage, security breach or a natural disaster all fall within this category. Disaster recovery usually has several discreet steps in the planning stages, though those steps blur quickly during the implementation because the situation during a crisis is never exactly as plan.

This highlights the importance of having an inclusive process of developing plans across the organization and including key stakeholders across divisional boundaries.

Benefits of a disaster recovery program

Like business continuity, developing a comprehensive disaster recovery program uncovers many different positive aspects as well as the seemingly obvious. Although the fundamental reason for developing and implementing an ongoing disaster recovery program is to recover quickly from an adverse event that impedes your organization's critical services, there are several benefits that may add to why this is so important. Organizational resilience through quick response and recovery can lead to competitive advantage amongst competitors all the while increasing the confidence with your stakeholders and customers. As businesses become more competitive, the need for increased compliance is necessary. A thorough disaster recovery program safeguards compliance standards and requirements by considering potential threats and risks to your organization. Putting realistic, detailed plans in place that are repeatable and practiced for all aspects of your infrastructure, systems and data will lessen the risk and maintain necessary compliance standards.

Developing a disaster recovery program

Similar to conducting a threat risk assessment for the organizations' most critical services and assets, a similar framework is applied to the organization's technology landscape. This model could deliver a repeatable and consistent technique for assessing an organization's IT capability and presents data in a format that is easily understandable and operationalized. The model consists of three steps:

  1. Data gathering - IT-specific data must be collected, either through an automated process such as a configuration management database or through other methods we've discussed about collecting data. This is important to understand where redundancy exists, where there might be single points of failure within the technology infrastructure or core systems and whether appropriate contingencies are active to ensure availability.

  1. Capability assessment - this is completed by scoring the assets in the IT environment against the data gathered using a set of prescriptive scoring methods. These methods are leveraged to assess whether the capabilities of each application meet recovery point objectives, recovery time objectives and service-level targets. As well, this step evaluates the strength of staffing, documentation and disaster recovery planning. To ensure objective scoring this step should be facilitated by an unbiased independent team. Consider why this consideration is important and be ready to discuss it during this week's class.

  1. Prioritized remediation - based on the resulting scores from the assessment, organizations can identify those services in the IT environment with the greatest gaps in their capabilities and are in greatest need of remediation.

This assessment model can be customized and applied in any organization to take an IT-centric view to assess organizational resilience. When developing a disaster recovery plan, it is important to adopt a "top-down & bottom-up" view.

  • A top-down view of solutions gives planners an end-to-end view of the environment, or of a single process or application. Comparing top-down views of applications side by side can help understand how resources are shared - or can be shared.
  • A bottom-up view tells planners the details about each component that supports a system or process. This allows for determining precisely what changes need to be made to ensure resilience and the ability to recover effectively and efficiently.

Finally, disaster planning is about recovering after an event, and is really a coordinated and integrated approach that spans the entire company and its operations. As such, the plan needs to be tested and practiced across the organization, time and time again.

There are numerous reasons for testing the plan. The recognizable reason is to make sure the plan will work in the event of a real disruption or disaster. However, the underlying reasons that testing helps the plan work more effectively is that testing services several purposes:

  • Checks for understanding of processes, procedures, and steps by those who must implement the plan.
  • Validates the integration of tasks across the various business units and management functions.
  • Confirms the steps developed for each phase of the plan's implementation.
  • Determines whether the right resources have been identified.
  • Familiarizes all involved parties with the overall process and flow of information.
  • Identifies gaps or weaknesses in the plan.
  • Determines cost and feasibility.

What value can Castellan bring to your organization?

  1. Expertise: Our team consists of a depth of real-world expertise and experiences in implementing and managing business continuity across an organization. Focusing on the business value of technology and walking the organization through a Business Impact Analysis; helps inform expectations around the most critical systems, technologies and data required to respond and recover quickly during times of adverse events.

  2. Industry Standard Frameworks: Our consultants use a consistent and industry-standardized framework for the recovery and continuity of an organization's most critical services and processes. These consistent frameworks assist in working outside of your organization's boundaries when working with partners, suppliers, and other parties necessary to respond quickly during unwanted events.

  3. Expert Facilitation and Training Expertise: Developing a framework and plan is only the first step in effectively managing the risks and impacts of emergencies. Castellan staff are experienced in facilitating awareness and training programs along with leading Disaster Recovery exercises across a broad range of scenarios and teams within the organization.

  4. Easy to Digest Reports: The best plans are based on the ability to understand the essence, regardless of roles and responsibilities. Recommendations coming from the Capability Assessment have tangible operational goals, metrics and clearly defined roles and ownership. Disaster recovery documentation needs to have a level of detail that an appropriate role(s) within the organization can follow and believe that the outcomes are achievable, understandable, and time-boxed options for which lay out a feasible path to increasing the organization's resilience.

  5. Personalized Approach: Castellan focuses only on Preparedness, Risk, Compliance & Governance, which allows us to offer highly personalized consulting services, enabling us to build strong partnerships and work closely with you to address your specific needs and challenges. Our approach involves collaborating closely with your key staff to design a customized security service that aligns with your requirements. This ensures that our service(s) is(are) tailored to your specific needs.

  6. Staff Cost-savings: By opting for our professional services, you gain access to our team at a fraction of the cost of hiring an in-house security expert. This offers significant cost savings while still benefiting from the extensive knowledge and skills of our team of experts.

Many organizations in the past have not focused on disaster recovery the same way it prioritize other functions such as finance, operations, or HR. However, as seen and realized from recent events in the last several decades, more and more emphasis is being put on understanding the risks and the need for continuity plans. Disaster recovery needs to be prioritized in your organization's planning and operational functions. Unlike other programs, this is not a once-and-done proposition, but an ongoing commitment to ensure that your most critical services stay operational, ultimately strengthening your organization's resilience!