What Is Risk Assessment in SMS
Risk Assessments in aviation safety management systems (SMS) are how you qualify, quantify, and rank risk exposure for:
- Reported safety issues;
- Audit findings and concerns; and
- Risk scenarios (consequences) associated with identified hazards.
Risk assessments are absolutely central to decision making in SMS, as all actions performed on the safety issue depend on the initial and subsequent risk assessments. Risk assessments are based on your risk matrix. How well you define your risk matrix sets the stage for critical safety events in your risk management processes.
How do you define your risk matrix? What problems will an ill-suited risk matrix cause you?
Determining Risk Index from the Risk Matrix
Using the default ICAO risk matrix (pictured at right), these assessments account for risk exposure by documenting:
- A number that corresponds to the Probability of negative outcomes;
- A letter that corresponds to the Severity of most likely negative outcomes; and
- The composite "risk index" that combines probability and severity.
Risk assessments are generally performed by the responsible safety manger, but depending on the organization and the context, they might be performed by:
- Safety team;
- Safety committee; or
- Subject matter expert.
Aviation service providers usually perform risk assessments on reported issues at various times, including:
- After issue is reported and entered into the risk management system;
- After all corrective and preventive actions (CPAs) are implemented and reported issue is closed, and
- During review after the reported issue is closed and validated to ensure CPA effectiveness.
Reported hazards or issues may also be reassessed at various stages of risk management as new information becomes available. Risk assessments are accounted for with risk matrices in almost every SMS. There are some aviation service providers that use drop down lists or simple values from a list when they don't have access to a risk matrix. These providers are definitely a minority and there are strong arguments for using a risk matrix whenever possible.
Other common elements in your SMS requiring risk assessments include:
- Audit findings; and
- Risk scenarios from your hazard register.
For simplicity and to avoid confusion, we'll focus on reported safety concerns (issues) during this discussion.
Here are some factors to consider as you start to assess safety issues in SMS programs.
1 - Understand What a Risk Matrix Is
A risk matrix is a grid used to calculate risk weight using a risk index (composite of probability and severity) in a risk assessment. Here are the most important details about risk matrices:
- Usually in a 5x5 grid, though it can be larger or smaller;
- Generally organized by colors (usually three), with some colors representing low risk or ALoS and some colors representing unacceptable level of safety:
- Green equals low risk;
- Yellow equals medium risk;
- Red equals high risk; and
- Some organizations will add additional gradients to further distinguish the level of risk exposure, such as orange, and light/dark red.
- Each level of probability and severity will have specific, identifiable elements for what constitute that level; and
- Selecting the appropriate severity/probability results in the composite risk index – such as 5C, 2B, etc. – that ranks risk exposure for risk associated with a reported issue.
While the risk matrix may be a 3x3, 4x4 or 5x5, you will commonly see an additional column and row for the legends. Risk matrix legends describe each element's respective weight, depending on whether you are describing probability or severity. You will commonly see a legend in the risk matrix either at the:
- Left column;
- Top row; or
- Bottom row.
In the example at the right, you will see the severity legend at the top row and the probability legend at the first column on the left.
Defining the identifiable elements for a given level of probability and a given level of severity is critical for a consistent/proper use of a risk matrix, as well as defining ALoS in your organization.
2 - Define Probability in Risk Matrix
Defining probability for a risk matrix means looking at each level of probability in an array, such as the numbers 1, 2, 3, 4, 5 listed in the default ICAO risk matrix, and adding a description of each element. Probability elements can mean either:
- Frequency, such as 1/1000 operations; or
- Likelihood, such as, “Has happened in company in last year.”
Here’s something extremely important to note – when we are talking about Probability, we are talking about:
- Realistic probability OF the hazard occurrence or recurrence.
To assign a probability to a safety concern, you need to first identify the hazard.
Describing Probability/Likelihood Table in Risk Matrix
From the legend above, we see the probability descriptions at the first column on the right. You are free to use this as an example, but the most simple, generic descriptions will also suffice. For example, your risk probability descriptions could be represented by the following table:
|1||Highly Unlikely||May occur only in exceptional circumstances.|
|2||Unlikely/improbable||Could occur some time|
|3||Possible/remote||Might occur some time|
|4||Likely/occasional||Will probably occur at some time|
|5||Certain/frequent||Is expected to occur frequently.|
|Simple Probability Descriptors|
If you require a more sophisticated probability scale, feel free to modify the following probability descriptors.
|1||Unknown in Industry|
|2||Known in Industry|
|3||Happened before in company|
|4||Reported > x times in company|
|5||Reported > x times at this location|
|Alternate Probability Descriptors|
Important Note: Keep your risk matrix simple and easy to understand. When you make your risk matrix too complicated, you risk:
- Poor results from risk assessments;
- Improper use due to complexity; and
- Redoing your work when risk matrix is changed.
When you have to re-create or restructure your risk matrix, you must consider all the legacy risk assessments your team completed during the earlier risk management activities. In some cases, this could translate into hundreds, thousands or tens of thousands of historical risk assessments. Be careful here. You will need to consider a strategy to remap your legacy risk assessments whenever you make changes to your risk matrix.
3 - Define Severity in Risk Matrix
Defining Severity is the same process as defining Probability/Likelihood. You move through each level of severity, such as A B C D E, and describe consequences that represent a given level of severity.
When considering the consequences to your organization, aviation safety professionals evaluate harm or damage to:
- People (employees, customers, stakeholders);
- Environment (air, water, ground, historical sites);
- Company assets (buildings, aircraft, vehicles);
- Operations (the mission);
- Security; and
- Company reputation.
Importantly, whereas Probability accounts for the likelihood or frequency of the hazard occurrence, Severity accounts for the:
- Severity OF the most likely risk occurrence outcomes of the Hazard.
In other words, given a hazard, what is the severity of likely outcomes.
For example, descriptions for a level 2 severity may be:
- Minor injury to one person; OR
- Minor degradation to current mission; OR
- Minor effect on local environment; OR
- Less than $100,000 in damages.
Any safety issue’s risk occurrence that falls into the above categories likely deserves a level 2 severity, unless there is an element of the safety issue that falls into a higher category.
Describing Severity Table in Risk Matrix
From the risk matrix depicted at the right, we see the severity descriptions in the legend at the top row. Again, feel free to use this risk matrix as an example, but other synonymous descriptions are adequate.
When we were describing legend items for probability, we provided two models to choose from: simple and an alternative. We'll do the same for severity.
|A||Catastrophic||Equipment destroyed - multiple deaths|
|B||Major||Major equipment damage. Loss of one life or serious injury.|
|C||Moderate||Significant damage to equipment and/or injuries. Serious incident.|
|D||Minor||Slight degradation of mission performance. Minor incident or use of emergency procedures|
|E||Insignificant||No significant consequence|
|Simple Severity Descriptors|
The following alternative is highly desirable because it allows you to customize the descriptions based on the type and size of your operations. You may wish to add additional columns, such as reputation or security.
|A||Catastrophic||Multiple fatalities||Massive effect||Extensive damage > $xxx||Total mission failure. Massive loss > $xxx|
|B||Critical||Single fatality||Major effect||Major damage < $xxx||Multiple significant mission element failures. Unable to continue mission. Major financial loss < $xxx|
|C||Significant||Serious injury||Localized effect||Local damage < $xxx||Single significant mission element failure. Maybe unable to continue mission. Substantial financial loss < $xxx|
|D||Marginal||Minor injury||Minor effect||Minor damage < $xxx||Certain mission element failures. Mission may be able to continue with minor degradation. Minor financial loss < $xxx|
|E||Negligible||Slight / No injury||Slight / No effect||Slight (< $xxx) / No damage||Minor degradation. Mission continues. No financial loss.|
|Alternate Severity Descriptors|
- Don't be alarmed if your risk matrix axes are reversed. There is no requirement that your risk matrix follows a particular model, unless your risk matrix is explicitly defined in your SMS manual. After a dozen years of building risk matrices for aviation service providers, we have seen almost every possible combination of risk matrix.
- Modify placeholders in the alternative severity descriptors to add your own financial risk tolerance
4 - Define Acceptable Level of Safety (ALoS)
Once you have defined Probability and Severity, you should be able to consistently perform risk assessments with strong justification for your assessment. In aviation SMS documentation, you may run into the acronym ALoS (Acceptable Level of Safety).
Think of ALoS as your risk tolerance. What must you do when risk is above a defined threshold as calculated by your risk index?
- Must you stop the operation?
- Can you continue, but with restrictions?
- Must you perform or review risk mitigation strategies?
- Who has risk acceptance authority based on defined risk tolerance?
Keeping Risk Tolerance Actions Simple
The risk matrix at the right is a good example of a more sophisticated risk matrix that has six levels of risk tolerability. This is not a bad example, and in the case of the company using this risk matrix, they can justify using the different levels of risk tolerance. A simple justification for using the risk matrix at the right is that it allows you to more accurately risk rank reported issues and hazard-related risk scenarios. Let's look at an example:
Imagine you want to list all your issues based on your risk index. Based on the three color risk matrix, we have:
- 6 red cells;
- 6 green cells; and
- 23 yellow cells.
Any two cells with the same color above will have the same "weight" and subsequent ALoS descriptors. As you know, not all red cells carry the same significance, meaning that a 5A risk index does not carry the same risk as 5C, 4B or 3A. In order to accurately refine your sorting criteria, you need more colors to define your risk tolerability table. You should only adopt this strategy if you have a valid business case, as additional risk tolerance levels quickly complicates risk management activities.
The main take-away is that you should avoid adding complexity to your risk matrix solely because the risk matrix looks "cool and sophisticated." You should have a good business reason that is understood by everyone in the organization that must use this matrix.
Over the past dozen years, we have seen a handful of over-zealous safety managers over-engineer their risk tolerance descriptors. We have seen that as time passes, and new managers enter the company, a simple blue-print is most effective. Three to four levels of action based on a calculated risk index is better than five, six or seven. Reduced confusion and shorter training times results from fewer levels.
You may define an Acceptable Level of Safety in the following way:
- Acceptable: any assessment composite (number/letter combo) or color that IS considered an acceptable level of risk exposure; and
- Unacceptable: any assessment composite (number/letter combo) or color that is NOT considered an acceptable level of risk exposure.
Defining an Acceptable Level of Safety simply involves documenting what is an acceptable risk assessment, and what is not. Once you have defined ALoS, you will know which process to follow for managing a safety issue, given the calculated risk index from the risk assessment.
The above ALoS descriptions (Acceptable and Unacceptable) are a good start, but adding a third level will afford more flexibility. Furthermore, having three levels will align better with your risk matrix that has three colors:
- Green - Acceptable;
- Yellow - Acceptable but require performance of risk review or additional mitigation strategies;
- Red - Unacceptable.
Four colors in a risk matrix is also very common. In this case, some cells are orange
Assign Probability and Severity, and Document
To assess safety issues in SMS programs, you will simply document the risk index (composite of Probability/Severity) of the issue. You will do this during an initial assessment, on issue closure, on issue review, and perhaps as new information comes in requiring a re-assessment during the risk management process (between initial/closing assessments). Essential data elements to document include:
- Date of assessment;
- Risk index;
- Phase of assessment (initial, closing, review); and
- Who performed assessment.
Another important element worth capturing is an "Assessment Justification." An assessment justification communicates your thought process as you performed your risk assessment. What were you thinking about when you evaluated risk associated with the reported issue?
With information acquired during the risk analysis process, first review each level of severity and decide which severity level's description best matches your current issue. Consider the worst, credible scenarios as you evaluate risk. Then repeat the process for probability.
Now you have your composite risk index. Make sure the evaluated risk index is stored in an easy to find place, where you can generate reports or perform data mining at a later stage. Having a database of documented risk assessments is extremely valuable for monitoring SMS performance.
All of the steps above can be facilitated by using commercially available aviation risk management software. You could do it in a spreadsheet, but if your company has more than 50-60 employees, you are recommended to use SMS software designed for this purpose.
To see a user-friendly, sustainable process for performing risk assessments, and all of the data mining and performance monitoring you can do with risk assessments, see the following demo videos: