RECOVER: the capacity of organizations to restore their services following a cyber-attack

Posted date 07/10/2021

Author

INCIBE (INCIBE)

Companies must be prepared to prevent, protect and react to security incidents that may affect them and that could impact their business. Therefore, the main business processes need to be protected through a set of tasks that allow the organization to recover after a serious incident within a time frame that does not compromise service continuity. This ensures a planned response to any security breach. This will positively impact the care of company image and reputation, in addition to mitigating the financial impact and loss of critical information from these incidents.

Prioridades del plan de continuidad de negocio

The Cyber-resilience Improvement Indicators (CII) model is a diagnostic and measurement tool specially designed to help organizations self-assess their capacity to anticipate, resist, recover and evolve in the face of incidents. These are the four goals of cyber resilience, the key to recovering from incidents. Specifically, the ability to recover makes it possible to determine whether an organization is prepared to successfully restore its services following a cyber attack. To measure the objectives of this goal, its two functional domains, incident management and service continuity management, are analyzed.

Definición de Recuperar

Incident management

The functional domain of incident management within the recovery goal measures the ability to identify those cases of suspicious situations (events) likely to become, as a whole, a confirmed security violation (incident). Below are some actions that allow us to achieve this goal:

Establish a process to detect and report events. With this process, mechanisms are implemented to identify events, such as unauthorized access attempts, high response times, increase in file volume... in the infrastructures that support the service; and for reporting them to those responsible, who will proceed to their immediate or subsequent analysis. In this respect, the automatic real-time event detection tools, such as SIEM (Security Information and Event Management), IDS/IPS (Intrusion Detection and Prevention Systems) or SOC services, can contribute to this process.
Establish a procedure for classifying and assessing cyber-incidents, based on a predefined classification of the same. Having metrics for each cyber incident, such as detection date, reporting date, resolution date and closing date, can be very helpful in defining this typology. This procedure will also serve to report on the organization's regulatory compliance in the case of internal or external audits.
Document and transmit the criteria that will help the organization's personnel to identify and recognize a cyber incident for reporting and analysis. It is vital to be able to rely on methodologies allowing the objectives to be met. For example, the National Guidelines for Reporting and Managing Cyber Incidents can be used. To this end, an organizational structure or a management committee must be established to respond to cyber incidents, as well as a formal protocol for reporting them to the relevant officials. In order to periodically review and check that you comply with the foregoing, you can prepare a policy.
Develop a procedure for analyzing incidents in order to identify the necessary actions for their resolution in the shortest time possible. For example, by responding to the following questions: what happened, who is affected (users/customers/suppliers), what should I tell them, who should I notify, does it have legal or contractual implications, or do we have control over the affected services and systems?
Estimate the cyber incident response capacity through the average response time to it, which can be measured as the time between the moment the cyber-incidents affecting a service happen and when they are resolved. For example, in the absence of suffering a cyber incident, the times obtained in the business continuity tests carried out may be considered. For this calculation it is recommended to follow references such as the Spanish Data Protection Agency, the National Law Enforcement Agencies, etc. Remember that you can use INCIBE-CERT's Incident Response mailboxes, particularly if your organization is a critical operator in the private sector that offers an essential service under the PIC Law, that is, a service “necessary for the maintenance of the basic social functions, health, security, social and economic well-being of citizens, or the effective operation of State Institutions and Public Administrations". It is important to report serious incidents that have occurred in the organization. Furthermore, if the service provided by the company is supported by an Industrial Control System (ICS), special attention should be paid to incidents regarding the physical security of the SCADA elements geographically distributed outside the organization's headquarters (industrial plants, outdoors, etc.).

Service continuity management

The general objective of this functional domain is to establish processes for developing, reviewing, testing and executing the service continuity plans, in addition to establishing processes to manage an adequate level of controls, which ensure that indispensable services for the proper functioning of the company, and which depend on the actions of external entities, are protected. For this, we recommend:

Support the provision of the service offered by the organization with a Continuity Plan and follow disciplined periodic updating, also updating when made aware of new risks or changes in the organizational or operational environment. To start this plan, we recommend the contents of this dossier: Contingency and business continuity plan.
Test the Continuity Plan for the provision of the organization's service. In other words, verify that we have test protocols for the Service Continuity Plan and if it is checked regularly, in order to:
1. Determine the feasibility, completeness and accuracy of the same with respect to the service offered by the organization.
2. Collect information on the organization’s readiness.
3. Ensure that the recovery time objective (RTO) is not only documented, but used to ensure service continuity. Check also that the RTO meets the continuity requirements of the service offered by the organization.

If the service offered by the organization is based on an Industrial Control System (ICS), which does not permit a complete stop for the execution of Continuity Plan tests, the execution of partial stops or in phases can be considered; the execution of tests on a replica of the same one; or even its simulation.

Identify and prioritize external dependencies linked to public services which contribute, directly or indirectly, to the provision of the service, such as emergency services or law enforcement agencies, and those linked to basic supply and telecommunications providers, which contribute, directly or indirectly, to the provision of the service offered by the organization.
Identify and properly manage the risks associated with external dependencies, such as cloud service providers and other technology providers, which contribute, directly or indirectly, to the provision of the service offered by the organization, in addition to prioritizing and updating the risks identified. In this way we will be able to know if, for each external unit, the organization has established and documented a detailed set of requirements that must be met, and what the risk is of not doing so. For example, a drop in the availability of the service or supply, loss of customer confidence, etc.
Supervise and manage the operation of the external dependencies that support the provision of the service offered by the organization. In this case we’ll check that a supervision of the operations of third parties that contribute, directly or indirectly, to the provision of the service is carried out regularly, so that the compliance of the cyber-resilience requirements agreed between the parties is verified. Additionally, we must check whether such requirements have been included as part of the clauses that comprise the provision of outsourced services agreements, or Service Level Agreements (SLA), reached with such entities. For example: the maximum time of non-availability of server infrastructure or penalties in case of non-compliance.
To further study this topic, you can lean on the ISO/IEC 27001:2017 standard, the National Security Framework and National and European Legislation and its development (the NIS Directive (EU) 2016/1148 and its transposition into Royal Decree-Law 12/2018).

All organizations are exposed to attacks, but the most important thing is to have the capacity to be prepared to detect them and to proactively anticipate and implement protection measures. The goals to anticipate, resist, recover and evolve are strategic in order ensure resilient service delivery. In particular, the recover goal is essential to successfully restore normal service following a cyber incident.

Etiquetas