Bjorn’s Corner: The challenges of airliner development. Part 25. Safety monitoring and reporting

By Bjorn Fehrm, Henry Tam, and Andrew Telesca

October 15, 2021, ©. Leeham News: Last week, we introduced the activities around Continued Airworthiness that we have to do during development and flight testing of our aircraft.

As described, the majority of accidents for aircraft are attributed to failings in Continued Airworthiness and Operations rather than design. We listed Continued Operational Safety, Operational Preparedness, and Service Readiness as the three important areas for Continued Airworthiness. 

We dive into Continued Operational Safety first, specifically Safety Monitoring and Reporting.

Figure 1. A graph showing how an OEM and FAA surveys the operation of an aircraft and takes action. Source: Boeing.

Safety Monitoring and Reporting

Safety Monitoring and Reporting is regulated in FAA 14 CFR Part 21.3, which requires that the design approval holder (us, the OEM) report any failure, malfunction, or defect in any product or article manufactured by us that it determines has had or could have a significant safety issue. Examples of such are fires (the 787 lithium-ion battery fires), structural issues (SuperJet horizontal tail cracks), engine issues (A220 PW 1500 compressor disc failures), flight control malfunctions (737 MAX issues), etc.

Note: We describe it all from an FAA and US OEM perspective. Each OEM is regulated by its national regulator, which has corresponding rules.

Additionally, if an investigation by us or our suppliers shows that a product or part is unsafe because of a manufacturing defect, the holder of the production approval for that product or part must report to the FAA the results of its investigation. It must also report any proposed or taken action to correct that defect. 

If action is required to correct the defect we must send the necessary data to the appropriate aircraft certification office so it can issue an appropriate airworthiness directive (14CFR Part 39).

Requirements are also in place governing the statistical & safety analysis needed to show how long the OEM can take to report and establish corrective action and what maximum calendar time current operators have to incorporate the corrective action based on severity & likelihood of the hazard. Like with most regulations, the goal is to reduce public risk to an acceptable level relative to the economic costs of the changes and disruptions.

An example of this trade dilemma is the reaction time and content of the initial 737 MAX Airworthiness Directive (AD), which didn’t keep the second accident from happening.

How do we satisfy these requirements?

There are a few key systems and processes that we need to build as a new OEM, prior to EIS, to satisfy these regulations:

Failure Reporting, Analysis, and Corrective Action System (FRACAS)

FRACAS shall develop clear and easy communication between us and our customers around in-service events. This starts from basic non-safety issues like lavatory failures to major in-flight incidents like engine shutdowns. Every time our product experiences a failure we should be able to collect, categorize, evaluate, and if necessary report the failure. 

The service data also feeds our safety process, as well as our spare parts inventory and pricing, maintenance programs, and future development activities. It requires an upfront investment in customer contract clauses, IT interfaces, process development, databases, and personnel.

As the field data is received we need engineers who can evaluate the failures individually and then in the context of past failures to understand if it is a concern or not. The potential severity of reports is evaluated using predictive statistics in combination with an understanding of failure modes for the system. For example, is this normal mortality according to the bathtub curve or the beginning of an epidemic failure.

Results are compared against regulatory guidance and company policy to evaluate fleet risk. Figure 2 is from the EASA guidance for its Part 21.3 implementation on Part 25 aircraft. It shows the allowable exposure time to an aircraft of a potential catastrophic failure depending on the failure probability.

Figure 2. From EASA CS25. Acceptable exposure time in months (X-axis) before action, dependent on failure risk (Y-axis). Source: EASA

So for a failure probability of one event during a million flight hours (1.E-06), the acceptable time before action is not even a month. Change it to one event in 100 million flight hours (1.E-8) and it can take up to 60 months before action is necessary.

Depending on the assessed level of risk, our team proposes short-term control and long-term corrective actions, ranging from visual inspections to maintenance events, and if it is a serious malfunction with a high probability of happening, fleet grounding, and redesigns.

We need to expose the process we build to our safety review board to evaluate the sufficiency and practicality of our proposed corrective actions. The system shall communicate the actions to all of our impacted customers and we shall be able to prove all have received our communication.

For low-risk items, this is done through service letters and service bulletins, while high-risk items require significant FAA coordination and the issue of FAA Airworthiness Directives (ADs) as shown in figure 1.

Notices of Escapement (NOEs)

Follow a similar path to FRACAS, but rather than in-service failures we look at quality escapes, both from our factory and from our supplier base (ref. latest Boeing 787 titanium parts issue).

Safety Review Board / Corrective Action Review Board (SRB/CARB)

Regulates how we make decisions on failures/defects that have a potential safety impact. This is a joint process between us and the regulator. This is where people would argue the MAX process failed after the first accident as the corrective actions agreed and implemented were not sufficiently aggressive to prevent the second accident.

Airworthiness Directives and Alternate Means of Compliance (ADs/AMOCs)

One outcome of the SRB is the decision by the FAA to issue an AD. The SRB also discusses AMOCs, which is how we adjust ADs to account for fleet variation and the identification of more pragmatic solutions to the issue after the initial AD is issued.

Other regulatory bulletins (Figure 1)

Notice of Proposed Rulemaking (NPRM)

When in service issues demonstrate a significant, but not immediately critical, opportunity for improvement in safety regulations the regulator drafts a potential new regulation and releases it for public comment under an NPRM or Notice of Proposed Amendment (NPA). Depending on public feedback, these may or may not become rules that future development programs will be required to incorporate.

Immediately Adopted Rulemaking (IAR)

Regulations that have sufficiently high and clear safety impact that they bypass parts of the normal rulemaking process, and can be immediately applied to in-work projects. A bit like an AD, but applied to the OEM rather than the operator.

It’s quite some responsibility and work

As these systems take time to develop, we have to start implementations over a year prior to EIS, as we have full obligations to the regulator from the first day of commercial operation. These processes and systems must be run by us as long as we have aircraft in operation in the market.

Not all issues are big issues

The examples we have given are all major incidents. The reality is that the regulators issue ADs daily, for smaller issues to large. Look here for the present list from the FAA.

Next week, we will look at the role Instructions for Continued Airworthiness (ICAs) play in preventing these types of air safety issues. 

7 Comments on “Bjorn’s Corner: The challenges of airliner development. Part 25. Safety monitoring and reporting

  1. On the subject of safety, the following article (from just a few minutes ago) is interesting:
    “The delivery flight of TUIfly Belgium’s latest Boeing 737 MAX runs into technical problems and returns to Boeing Field”

    In particular:
    “According to some sources, it suffered an electrical stabiliser trim failure. The replay of the flight on Flightradar24 shows that the crew had difficulties maintaining altitude and never reached FL150 requested by ATC, nor were they able to maintain FL130 instructed later to return to BFI.”

    • Yes . We can all read the daily list of incidents from Aviation Herald too.

      Engine failure…. missile explosion near aircraft…. That was just some from Saturdays run of the mill issues.

      • I wouldn’t classify mass skill deterioration amongst pilots as “run of the mill”, but perhaps that’s just a NZ way of looking at the world…

        • It is “run of the mill” in scope of “what did you actually expect would happen …”.

          contrary to what some naive individuals believe
          future blowback is accessible to premeditation.

          • “what did you actually expect would happen …”

            Since nobody in February 2020 “expected” the air travel hiatus to last as long as it has (and still is), this assertion is moot.

            And since the entire aviation sector has been caught off guard by this phenomenon, it would seem that the sector has nobody with the “premeditation” powers to which you refer…so that assertion can also be binned as fantasy.

            Any other fantasies that you want to share?

          • “Any other fantasies that you want to share?”

            I only share realities.

            some don’t realize this. even if that reality has sunk her teeth into their noses. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *