Bjorn’s Corner: New pitch trim issue forces further changes to 737 MAX software

By Bjorn Fehrm

June 28, 2019, ©. Leeham News:The Federal Aviation Administration has asked The Boeing Company to address, through the software changes to the 737 MAX that the company has been developing for the past eight months, a specific condition of flight, which the planned software changes do not presently address.”

This is the text of an 8-K filing Boeing issued to the stock market two days ago. Here is what it means.

What the 8-K filing means.

Here is what Wiki says about an 8K filing: Form 8-K is a very broad form used to notify investors in United States public companies of specified events that may be important to shareholders or the United States Securities and Exchange Commission.

The filing means FAA has found a flaw in the software Boeing has developed to fix to the MCAS problem. The find and its consequences are significant enough so Boeing’s shareholders should be informed about it. It can affect the value of Boeing on the stock market.

The flaw is not related to MCAS but to how the revised software affects the aircraft’s processors in the Flight Control computers when these have simulated fault conditions.

During a check on how different faults (in this case a fault in one of the microprocessors in the Flight Control computer) can cause Trim Runaway conditions the FAA found the 737 MAX Flight Control computer got overwhelmed by the data flows the simulated fault caused and it delayed the actions the FAA pilot could take to stop the trim runaway.

The FAA pilot classified the resulting pitch change and the delay to stop it as a “catastrophic” event, meaning the plane could crash if this fault would happen in flight.

What does this mean for the MAX?

Boeing must go back and change the updated software for the Flight Control computer so this data flow condition does not occur. The changes will take time and further delay the 737 MAX’s reentry into service. This is why the company did an 8K filing.

The speculation is the changes will add another two months to the delay of 737 MAX reentry into service, meaning the anticipated September reentry now seems optimistic. It also adds to the string of bad news around the 737 MAX.

The discovery is not done in the part of the code which handles MCAS. It’s found as a wider verification the software changes haven’t produced any secondary hazards in the 737 MAX flight control system.

Software changes in a flight control system are always verified with an exhaustive FMEA analysis (Failure Mode and Effects Analysis) and it’s during such verifications the new condition was discovered.

The FMEA analysis lists all possible faults which can occur for a critical function in an aircraft and the fault scenarios are then played through and simulated in the aircraft’s simulators. It was during the simulation of such a possible fault in a 737 MAX simulator at Boeing in Renton the issue was found by the FAA.

It has been questioned why the FMEA performed on the original Flight Control computer software didn’t detect the hazardous MCAS condition caused by a faulty Angle of Attack sensor. If properly executed it should have found how dangerous MCAS could be with certain system faults.

Now, the FMEA analysis worked as it should. It detected a problem, this time caused by how the fixed software changed the data flows in the flight control system’s computers.

189 Comments on “Bjorn’s Corner: New pitch trim issue forces further changes to 737 MAX software

  1. How many more of these issues are hiding in the code that runs the aircraft. Maybe none, maybe ??? The stock market, which was right about the 787 grounding not being catastrophic, still suggests the max will return to service without physical changes and by the end of the year. What are the chances of this whole sorry affair getting worse yet? What are the chances that BA will end up having to make physical changes to the max?

  2. I have read about this new problem on other forums, news outlet etc. But now is the first time I read a coherent description on the exact nature of the problem, especially that the problem relates to testing the system behaviour when a simulated flaw has degraded the capability of the fcc.

    Can you say something about the layoud of the fcc’s?
    I understand there is two of them, but does each one contain one or two processors? And how do they work together? What is the level of comparison, data validity check, voting etc? Do they all run continously, or are some idle at any time?

    • The 737ng and MAX have two FCCs, each with two processors. Depended on the flight mode (autopilot or manual flight) these processors either divide the tasks between them or one is active and the other is standby. If there is a fault in the active microprocessor the inactive one can take over in the FCC. This is as I understand it I should add. There are descriptions of the FCCs and their global functions, but the exact working of the software is not public knowledge.

      • Interesting that the module to provide warnings of an error from the AoA sensors was omitted, and now we have new code that may well be doing a similar task, a problem has been found in terms of overloading a chip in the FCC?? Do we know if the FCC has reasonable processing headroom?

        • And the warning would have done no good to the Indonesian pilots as they had not a clue per the manual what it meant.

          For Ethiopian they knew what the issue was but handling it was aggravated by the manuals lack (and Simulators)

          What a bollocks.

          Now the disagree in 2.0 does something really useful , its turns the damned MCAS OFF!

  3. Given that all three events had a failure on the captains AoA and that the AoA was changed after the first event, it would possibly point the finger that a computer/software fault was behind the crashes not the single source AoA.

    If so it would be sensible to wait for the conclusions of the accident investigations before allowing the 737 max to fly again

      • No its not.

        Satcom looked at the data from the AOA. Each one has two separate (callth4em resisters) on the shaft.

        Each one feeds to a different computer.

        Both registered wrong data that was in the data feed released. As the only thing they have in common is he shaft, logic says that it was a failed AOA vane/movement that affected both and was not software input.

        Ergo, AOA had failed in both cases.

        These problems are not goign to be solved by “enthusiasts ” they are way ahead of anyone outside the loops sans someone like Bjorn.

        Bjorn clearly lists some parts are proprietary that are not available for analysis.

        • @TransWorld

          How you are so sure that oversaturated FCC didn’t slower response of pilots manual switch trim in both crashes???

          Maybe pilots orders of manual switch trim were waiting too long in a queue while MCAS was working, and working???

          • Which ‘manual switch trim?

            I doubt either the ‘pickle switches’ on the control wheels or the cutoff switches on the aisle stand involve a computer.

            I expect there is a function in a computer that stops stab trim when the force sensor at the bottom of the control column shows high force from the pilot pulling the control column back hard. It sounds like people are calling that a ‘switch’.

          • @Keith

            Pilots can ttrim 737 manually using switches/buttons on the yoke (manual trim via electric engine of stabilizer).

            Pulling column by pilot, like on NG, on MAX has no effect anymore. Boeing has disabled this functionality.

        • TW, as I remember the two accidents the scenarios were as follows; (1) the LA aircraft replaced the left AOA sensor two days prior to the accident flight. On the last flight the day before the accident flight the flight crew managed to ‘override’ the MCAS runaway. After the AOA sensor replacement, the new sensor showed 20+ degrees wrong. Why, I don’t know, – I have looked at everything from calibration errors to installing the new sensor ‘one screw hole wrong’. You can see the left side AOA being at fault on the preliminary report, – even during taxing and take-off roll. It was at fault long time before the accident.

          (2) On the EA preliminary report you can see that the left side AOA was good until the rotation, when it went off-scale.

          The preliminary reports also show that the flight crews tried to counter the MCAS actions, until, it looks, they gave up.

          Whilst I understand that the LA crew failed, I have greater difficulties in understanding the EA lack of correct actions, then MCAS had been ‘on the news’ for several months.

          I thought this subject was clarified once and for all. How wrong could I be.

        • These problems are not goign to be solved by “enthusiasts ” they are way ahead of anyone outside the loops sans someone like Bjorn. ”

          you may want to have a look at “SatCom Guy’s” vita :-))))))))))

        • That’s ‘s2tcom guru’, the blog of an engineer experienced in transport airplane safety – Peter Lemme, who apparently now concentrates on satellite communications technology.

          I appreciate his work but it is not guaranteed complete and sometimes is a big vague.

    • The AOA was changed PRIOR to the first event. The faulty Lion Air replacement at Denpasar is what caused the failures on both flt 043 & 610.

      • correct, but, they had been having problems with that aircraft’s sensor readings for some time and had tried some fixes, and finally switched out the AOA sensor. Maybe they still have that “faulty” AOA sensor that was switched out? It would be either two bad AOA sensors in a row. Maybe a bad batch? A problem with the installation or testing / calibration of the replaced sensor? We don’t have the final report yet on the Lion Air accident. For a very new airplane to have two sensors go bad, makes one wonder. Maybe it was some procedure during pushback that scraped the sensors? But, I’m not 100% sold that it was a faulty AOA sensor, and not another cause in the air data computer or connectors etc.

  4. Hi Bjorn,

    thanks for the infos.

    Can you estimate – since the problem seems not be related to MCAS – whether NGs could be affected as well? Do you think the problem was accidentally “introduced” with the MCAS-software-fix and that’s the reason, it wasn’t found before? Or might the FMEA analysis have not been done as carefully as it seems to be done now last time?

    • I’m asking because – as far as I know – there are NG-crashes which causes are not fully understood. Especially “Flydubai Flight 981”. After an attempted go around they dived into the ground with an angle of 45deg. There is no official report out by now, so no FDR data, but to my knowlegde the pilots are blamed. According to the preliminary report there was a “stabilizer nose down deflection from -2,5 deg to +2,5 deg at a height of 900m” which caused the plane to crash. In case the two experienced pilots did not lose “situational awareness” and the stab-trim was not reacting to inputs, it sounds like what you wrote in your article: “The FAA pilot classified the resulting pitch change and the delay to stop it as a “catastrophic” event, meaning the plane could crash if this fault would happen in flight.” Of course I don’t want to speculate, but I would like to ask experienced pilots around here about their views on this accidents.

      • Thomas, the Dubai crash was a stick-forward & stabilizer trim event in heavy clouds, with no frame of reference for the crew. It was recorded by the FDR. After a pitch warning, they had the illusory perception of extreme nose up, and pushed the stick forward to apply elevator and also held trim switches for 12 seconds to apply stabilizer. That combination of elevator and stabilizer gave an extreme nose down pitch, with not enough time or altitude for recovery at that point.

        12 seconds is a lot of stabilizer trim, enough to get -1 g. So perhaps they were caught by surprise and gripped the column without realizing. Or perhaps the trim switch stuck. We don’t know what happened but the result is similar in that excessive nose-down stabilizer is difficult to overcome at low altitude.

    • Logic says the NG has the same issue. How many are flying right now with this setup?
      Are there “unexplained” nose down incidents affecting the NG?
      I seem to recall a comment – on a related thread – to that effect.
      It seems NTSB would need to dig into this ASAP.

  5. Thanks Bjorn. Is the 737MAX becoming a real “Band-Aid” aircraft trying to fix hardware problems with software patches. They are trying to fix the symptoms not the causes?

    • All aircraft that use computers have “patches”

      How many upgrades does your operating system have?

      Issues are found all the time and corrected (or so they hope)

      • Boeing was in a rush to get the MAX in the air because of the 32X-NEO’s, they also didn’t want to spend the money and time to make the landing gear (and bays/wing) changes. Share price, share price!!!

        Could we see 757’s in storage being dusted off?

      • Yes, electronic boxes change pn all the time with new revisions of software (and a nice bill for the update), one thought that avioic workshops would be out of work when the boxes went from electromechanical to computers but just the workscope changed.
        Similar with modern cars where they update software at many services and thus remove some errors and introduce new ones. Bad braze with time will make some cars total electrical disasters (like the ones with massive harnesesses and bad brazes like old BMW 7-series and Mercedes S-klass)

      • Your home operating system doesn’t have aerodynamic issues and seat 175-odd passengers.

    • It is a Schroedinger Plane.
      Once you look in the box you see that all parts are broken 🙂

      • Good line thanks.

        (You’re playing on an obscure comment by a scientist who was skeptical about a claim involving certainty, he dreamed up a scenario about not knowing whether a cat in a box with a Geiger counter and an isotope that triggered release of cyanide, is alive until open box.)

        • More like the Scrodinger plane is both ‘safer than safe’ and ‘not fit to fly’ at the same time.
          As in Cat analogy , MCAS software has a 50:50 chance of working correctly in certain circumstances, but you dont know till you fly in it !

        • Not really an obscure comment.
          IMU it was designed to explain in laymen terms some quantum physics effect. Contrary to classic expectations
          the inner state in the box really is indeterminate until you test for it by looking inside. ( the effects gets a real world application in “entangled photon pairs” used for snoop proof communications. )

          • MMM……

            Very few ‘laymen’ have heard of the comment.

            It was a derisive comment by a skeptical scientist trying to illustrate the error he claimed another scientist was making.

            Things are what they are, they have identity. Your claim is false.

            Beware that people mis-use quantum theory and other science to suit their own ideology.

            For example, it was popular a couple of decades ago to claim that the technique called ‘fuzzy logic’ superceded Aristotelian logic in computations – never mind that the computers operate with binary logic – ones and zeros. While ‘fuzzy logic’ is difficult to get one’s head around, the claim was irresponsible, for the purpose of promoting the ideology that claims humans can’t be certain of things

  6. The mystery is why this thorough FMEA analysis was not ordered by Boeing/FAA in their checklists after they changed the MCAS system logic/settings late in the Flight test program or did they miss it even for the first MCAS FMEA analysis?

    • Probably no properly working simulator at that time… Do they have properly working simulator now ??

      • Boeing has some and a few airlines have them, a big lot will be delivered before X:mas. But with Boeings development software systems “Model based computing” it is much easier to work thru failure modes in all different combination and see the time/reponse than introducing them and run them in a full flight simulator. That you do afterwards per FAA checklist and cheif test pilot add on cases. The mystery why the whole work now being done was not triggerd by the change of MCAS 1.0 to 2.0.

        • The “Model” might not accurate enough to produce reasonable test results (FMEA).

          • They are getting very good and accurate, but you still need to the verification of the important ones in full flight simulators and then flight tests with some exclusions for the very dangerous ones.

        • Nothing to do with ‘model based’ (which term you garble).

          Does take thorough work.

          It is claimed that updating the FMEA was simply not done, a major error as the design of MCAS morphed to aggressiveness.

          Several people were asleep – leads/managers should have raised a red flag at use of only one AOA input, for example.

        • Yes, when flight test discovers and reports an issue like this, the old system probably had a routine where a cheif engineer with project engineering set up a system with actions, accounts and reporting milestones/instrumented tests where if unplanned problems pop up in the work (like during full FMEA) further resources and accounts gets activated and certification moves a bit to the right if your problem is the limiting work, most likely they have 30-190 test issues that require proper engineering work and testing and the one who delays certification the most gets all the heat…

          • Re Pablo June 30, 2019 (no Reply button displayed):

            Your are too glib.

            I pointed to the switches on the control wheels, in my day called ‘pickle switches’ which pilots use routinely, and to the cutoff switches on the aisle stand behind the throttles, which are for emergency disabling of stabilizer actuation.

            I’ve _speculated_ that the problem just discovered is that processing of force sensor inputs is very slow. But that might be seen in normal operation.

            (Original 737s had what was called ‘control wheel steering’ with which IIRC pilots could maneuver the airplane a bit while on autopilot by moving the controls, IIRC that’s what the force sensors were for.

            Yes, people are saying the MAX does not stop stabilizer movement by pulling back hard on a control column, and are referring to a ‘switch’ that is supposed to do that. You have not clarified that.

            I know there is a force sensor at the bottom of each control column, perhaps there is a switch specific to the stop function, but even Peter Lemme is not clear.

  7. This is by far the best explanation I’ve seen.

    But an actual fault in a microprocessor. That’s beyond rare. Because of this, are we talking about a physical malfunction of the microprocessor. In other words, a hardware failure. If that happens, normally the operating system won’t boot or the operating system will shut down, meaning the FCC simply doesn’t work, at all. Overwhelmingly there is no inbetween situation.

    So even though this is the best explanation by far, it’s still as clear as mud. Why won’t Boeing simply explain the fault condition in explicit terms?

    In other words cause and effect. We know the effect, runaway trim stabiliser. But we have not been told the cause, the actual fault that triggers the effect.

    The article later on does use the word “code”, so I do think we are talking about a software fault, not a microprocessor fault or hardware fault.

    Here’s a question for you journalists to ask Boeing. We are being told that there is a flood of data, possibly causing the microprocessor to run out of memory. What is causing the flood of data and why is it generating a flood of data?

    It all reminds me of what I said a week after the Lion Air crash. The FCC went into “meltdown”.

    A two month delay? How many other error conditions have Boeing not checked? It takes years and 1000s of man years to write control software that is safety critical and therefore mandatory.

    There lies the issue. MCAS is safety critical and therefore mandatory. Why?

    • Its Microsoft county… had a Windows update killing my daughters hp Elitbook 8540w, heard later a Dane suing Microsoft for killing his hp lattop with the same update and he won a few $1000 from them. So faults in computers by hardware or software induced hardware faults are quite common and it is good that FAA test these faults as well.

      • Software induced faults are common, but not hardware induced faults.

        But I agree, testing error conditions is an unconditional necessity.

        With hardware, the hardware controller on the microprocessor checks that the hardware works. If the hardware controller signals an error the operating system won’t boot. In other words, it’s not the job of Microsoft or others operating system developers, it’s the job of the chip manfacturer.

        • Philip, I believe Bjorn is saying that the FCC has a data overrun condition, it cannot keep up with the dataflow that results from a specific condition. The software must manage data so as to remain within the processor limits. So that is what Boeing must do now, change the software so the dataflow limits are not exceeded for any case.

          • You are right. I’ve provided a link below, which to me means that everything falls into place.

            For example, it explains why the FCC didn’t address alpha vane failure conditions and out of bound conditions. The CPUs aren’t big enough to do ‘fancy’ (scrasam) things like that.

        • Bugs in processors are quite common. Ever looked at the Errata Documents of Intel for their recent processors … quite a long document. This is the reason why for many security relevant tasks old processors are used. Most of the time they are sufficent performance wise and you you know where the bugs are.

        • In our case it was a software update that happend to overstress some hardware, then during restart the processor identified the failed hardware and shuts the computer down. Other systems like telphone switches have redundant boards so when identify a failed hardware it just blocks that board and use another one. A consumer PC is not built that way. Appearantly it is cheaper to build massive parallell boards with crappy hardware than building a few with top notch components and most of telecom software engineer write code how to handle failed crappy hardware.

          • With the possible exception of the programmers who did not envision a very large numbers of calls to 911, so their software shut down thinking something serious was happening to it or the CPU. Shut down 911 service over several states in the US.

            As was the case when the entire ACARS system shut down a few decades ago.

            Took staff a half hour or more to figure out that something was wrong in the international section and shut it down, took several hours to figure out the problem was that some techs had coded a new installation with the same address as a flying airplane, software couldn’t cope properly with that.

    • “How many other error conditions have Boeing not checked?”

      I would rather ask: How many other flaws Boeing is trying to hide?

      The most recent flaw an “FCC meltdows” is “catastrofic”, but by Boeing was assesd as only “major” – that’s mean two levels below, the least dangerous. Arrogant as…. (sorry for the word but I think it is precise description of Boeing “safety culture”)

    • – But an actual fault in a microprocessor. That’s beyond rare. Because of this, are we talking about a physical malfunction of the microprocessor. In other words, a hardware failure. If that happens, normally the operating system won’t boot or the operating system will shut down, meaning the FCC simply doesn’t work, at all. Overwhelmingly there is no inbetween situation.

      Sorry? Absolutely false. It’s quite possible for faulty logic to work fine during boot-up, only to be overwhelmed when stressed. It’s quite common in consumer PC land. This is not binary “Works/Broken”, there are many factors that can lead to wonky hardware:

      – Heat
      – Voltage sensitivity
      – Memory registers that are subpar, only acting up when used (or when hot/certain voltage enters it etc. etc).

      • Not true. Just look at the errata for Intel chips. Intel publish their flaws on their web-site

    • The FCC is operating in real time, constantly collecting inputs from sensors (air speed, temp, AOA, etc). It does this mostly sequentially, not in parallel, (unless there are sub processors somewhere in the mix). So, there is a large highway of data, being processed by one toll booth taker. If too many cars, at rush hour, try and get processed through the toll booth, a delay factor happens. In order to speed traffic though the process, you can meter the traffic by sampling non-important data at a slower rate (less cars entering the highway). Or, training a new processor to be faster in taking tolls. Or, parallel processing by putting in more toll takers (processors), and have a master processor keep track of things (much more complex, low level programming, mostly at the operating system level. Maybe the new fly-by-wire spoiler system added more data and output processing along with MCAS to clog up the highway on the FCC? Maybe at high temperatures, some CPU’s built on a Monday were prone to fail? I do wonder how a single AOA sensor signal along with no limits on the trim command were able to be certified or even proposed, and not caught in a design review process. Or why the stab trim cutout switch functions were changed on the MAX? Obviously, the FAA (Congress) has let Boeing have too much control over the certification process.

    • While a microprocessor fault is rare, there is still a chance of occurrence.

      An FMEA has 3 parameters:
      P – Probability
      S – Severity
      D – Detectability
      There are a few ways of processing this but when designing automotive electronics, we had each factor ranging from 1-10 and then calculated an RPN score which is P*S*D. Then you address the items above a highest acceptable risk score and bring them down to acceptable levels. For us where you have a RPN range of 1-1000, we had a highest acceptable RPM of 90-100 at the start of a program and 60-80 by the time we go into production. Once all are within acceptable range, you continuously work down the top 10 items (as you reduce risk, these items keep changing) till you go into production.
      However, there is one exception. If a severity is 10 (potentially fatal) or 9 (severe injury), the risk is unacceptable regardless of other parameters. So if D =1 and P =1 (easily detectable, highly unlikely) but S=10, it is still a no-go and we can not go into production.

      The same applies here, it does not matter how rare a fault it is, if the severity is “catastrophic”, the severity must be reduced no matter what. So the difference between Major and Catastrophic is big: Major is acceptable as long as the probability and detectability are very low while Catastrophic is unacceptable no matter what and becomes a showstopper.

    • I think there was too much data being added by the new FBW spoilers and MCAS and the computer processing slowed down. Without sub processors built into the mix, data has to wait in a hold queue for the CPU to be done with it’s current take in order to work on the next piece of data. They can slow the sample rate of non-critical sensor data, or put in a faster CPU with more programming changes, or add more sub processors and break up the program into sub modules with a lot more complex, low level programming. The last two options involve major surgery to the code, unless, there is some, off the shelf, already proven, processor upgrade kit.

    • Faults in microprocessors are not as rare as you think. Especially when you’re above quite a lot of the earth’s atmosphere and nearer to the radiation of space.

  8. It isn’t a real mystery, Imho, it was simple, arrogant rush for money.

    • Bingo!! And for profit, 346 people were wiped out.

  9. The inescapable reality is they freaked when AA ordered A320NEO and the suits and bean-counters overrode engineering and ordered a kludge job. Putting engines forward of COG and relying on software to sort it out? Now the 737NG processors can’t handle the data traffic.

    Honestly, Boeing have just made the worst set of cascading decisions, which down under we categorise as a Fuster Cluck. I won’t fly on one of these things.

    • Put a small fixed stabiliser at the top of the fin, able to generate a pitch moment equal and opposite to the effect of the forward-mounted engines. Always there, always available, no power or switches or software required.

    • I’m guessing that there is a lot of pressure from Boeing’s management to improve the financial results, doing more with less. So, corners are cut, including proper design reviews, FAA quality checks, illegal sign offs etc. So, Boeing is left with grounded planes, large engineering problems, and a stock price still up year-over-year, after two plane crashes. I wonder when things will change?

    • It is significant.

      Its not fatal per se, but it is another data point that this got through the system.

      Much like the FOD on the KC-46, is the core morale, the system, combination of both?

      Its not that something was done wrong, it happens to all of us, its that it got through with approval.

      • Yep, that’s about the size of it.

        Boeing like to say that they have an Andon system, something right out of the Toyota book on How to Run a Factory. However the long list of quality problems indicates that they’ve not taken it to heart. Besides, there’s way more to the Toyota Way than Andon…

        The problem American businesses have with Toyota’s Way is that it seems utterly incompatible with a hire/fire culture, cost efficiency, screwing suppliers for every cent, the societal class system that pervades US industry, and keeping out the unions. The reason for the apparent incompatibility is, because it actually is incompatible with those practices.

        Essentially it’s all about winding in one’s neck in, and going for long term success irrespective of the short term consequences! Clearly Boeing hasn’t embraced that. There’s whole books on this stuff, essential reading for anyone in engineering.

      • Incidentally, one of the hallmarks of an Andon system is stopping whatever’s happening when a problem occurs, identifying the problem, deciding on how to fix the problem through consensus building, and fixing it before starting up the line again. What you don’t do is keep making whatever it is to fix it later on.

        It’s a bit more subtle than that – you don’t necessarily shut down your entire production line – but that’s the basics. In keeping making MAXs Boeing are ignoring the very system they claim to have put in place. Andon isn’t just at the microscopic scale of an individual on a production line, it applies to all processes across the WHOLE COMPANY, and all your suppliers.

        A consequence of it is you end up being very nice to your suppliers and staff, because you learn how important they (and their willing participation in self improvement) are to you.

        What’s this, and engineer giving out business management advice? Well, yes; you have to know this stuff these days to be a good engineer.

    • “Being cheap” can turn quite expensive 🙂
      Excessively pressuring subcontractors on price will create further blowback in the future.

  10. To add further. It is now clear that runaway trim stabiliser constitutes a catastrophic condition that will result in the loss of an airplane. Specifically we know the elevators become inoperable and manual trim becomes inoperable.

    Why are the FAA allowing Boeing to address runaway trim stabiliser using software thereby introducing the possibility of a software error. It can’t ever be allowed. So use a hardware fix to prevent the stabiliser from running off in the first place. In the alternative make hardware changes to ensure the elevators remain operable and manual trim remains operable.

    We all know the answer. It will take time to introduce hardware changes. Money first, people last.

    • FBW aircraft use all software and computers to fly.

      Are you saying that should be dis=allowed and go back to wires and cables?

      • FBW means the cables are removed and replaced with wires that transmit electricity for electrical activation.

        To answer your question. No. Airbus airplanes are naturally stable. So Airbus software doesn’t fight the airplane. The 737 MAX is not naturally stable. So the software does fight the airplane.

        • you have a very simplistic view. in this case FBW/non-FBW is a difference without distinction, i.e. at the end of the day, in both cases computers are filtering the pilots inputs through software and causing flight control surface movements not directly commanded by the pilot.

          whether those signals get from A to B by a cable or a wire is irrelevant.

          additionally, the Airbus FBW jets _are_ configured with _reduced_ stability margins vs Boeing’s non-FBW jets (in normal attitudes) and rely on the faster response speed and inherent filtering of the FBW system to make the plane _appear_ to fly like an aircraft with traditional stability margins. this is all to save fuel by reducing the amount of negative lift produced by the stabilizer.

          The 737 Max, at very high angles of attack, has a rapid onset reduced stability configuration (bordering on unstable) Which MCAS was intended to address (why they didn’t just go with a stickpusher/stall warning I can not comprehend). in normal AOA situations, the 737 Max is no less stable than the NG, it is only in outside the envelope situations where the stability curve diverges.

          you keep trying to sell that the 737Max is not naturally stable and that is a bald faced lie. _all_ aircraft have stability curve divergence thresholds. the Max’s is not as far outside the normal envelope as the NGs and more aggressive, but more than anything _different_ from the NG and they wanted to keep the same type rating.

          but it is not naturally unstable as you keep trying to say.

          • I don’t have a simplistic view, I have layered view or building block view.

            As a comparison, the A350 as 3 primary and 3 secondary FCCs/FMCs. It is the secondary computers that execute direct law. So they send signals down the wires that do not moderate the actions of pilots using software algorithms. It is the primary computers that implement software algorithms that moderate the actions of pilots.

            The primary computers can be turned off meaning the pilots fly the airplane without moderation.

            And so on. Not simplistic. Just layered.

            So, as I said, Airbus airplanes can be flown without moderation from software algorithms. Apparently the 737 MAX can’t. There is only one reason for that. It’s dangerous.

          • There is a difference between certifying commercial and military aircraft regarding pitch stability in subsonic operations, where commercial aircraft has to be stable with some margin while almost all modern fighter aircrafts are pitch unstable below M=1 with computers needed to fly them. At supersonic flight the center of lift moves aft and make them more pitch stable. We will see how Aerion and Boom will solve this on their SSBJ’s fly by wire system.
            Just see the Sukhoi RJ FBW system being put in alternate mode by a lightning strike and the pilot got into PIO even without other combined faults like one engine out or hydrulic/other electrical systems faults on a clear day in ok weather. So designing aircraft control systems in pitch mode is not that easy.

        • Stop!.

          You should read Bjorn’s articles on the reason for MCAS.

          (Which by the way is not ‘unstable’ but control feel approaching stall, which might contribute to a pilot pitching up into a stall in an emergency.)

          • I assume you’re referring to Bjorn’s article on pitch stability series.
            If MCAS is there for ‘control feel’ then I would think it would tighten up pressure directly on the control column. Not, turn the stabilizer down, ten seconds at a time, at the high speed trim setting, until the AOA is back into a ‘safe’ range. To me that’s a stall prevention system. I agree that there may be a control feel issue close to stall, but, to trigger the stab trim motor, at that point isn’t for ‘control feel’, it’s for stall prevention.

        • Davenport, you have it backward.

          MCAS is intended to avoid the over-controlling not to change feel.

          Badly implemented of course, not necessarily a good way to address the concern, IMO far too aggressive for the phenomenon of concern.

          (Engineers and pilots and lawyers do often over-react. For example circa 1968 Trans World Airways wanted to have a flashing red light to warn pilots they’d busted through an assigned altitude. Fortunately a sharp Boeing test pilot put some perspective into the discussion, pointing out that engine faire and autopilot disconnect were far more serious things to draw pilot attention to.
          He became Chief Pilot of the 767, which inhibits fire warning until 400 ft AGL because Job 1 is to fy the airplane to an altitude where it is safe to start action to deal wit the fire, including start returning to the airport.
          The tragedy of an L1011 in the Everglades illustrates the importance of someone flying the airplane – some how altitude hold disengaged and the airplane drifted down into the swamp.
          IIRC it was 707 Flight Deck supervisor Jim Tsai who took the lead in organizing a meeting of projects within Boeing to get a common approach in Boeing fight decks – steady good guy Jimmy.

  11. First of all, thanks Bjorn/Scott for sharing your proficiency and insights with us!
    1. The FAA gets the credits on that one!
    2. It is hair-raising to see that Boeing did not submit a thorought software fix.
    How could that possibly happen with a company that has blessed civil & military aviation with magnificient platforms (incl. the 737) and knowing that the MAX is essential to the company? I read many times exec comments that the fix is the focus of the airframer. What went wrong?

    • Well put.

      Its clear Boeing system of changes is broken.

      The simple answer is to go back to what worked.

  12. The fault was found. Presumably it can be fixed. What I find alarming is that whereas the (seemingly newly assertive) FAA categorised this as ‘catastrophic’, Boeing initially categorised this as only ‘major’. To me this implies those involved at Boeing either a) don’t understand safety sufficiently or b) do understand it but continue the apparent relative lack of concern for it vs commercial matters. I would have hoped that by now they would have been absolutely clear safety is non-negotiable.

    • This is not an actual microprocessor hardware error.

      Most probably it’s one of 4 things
      1. The system is running out of memory due to the increase in the amount of data
      2. There’s a memory leak causing an out of memory condition
      3 data structures are not sized correctly causing the Real Time OS to crash
      4 microprocessor is running out of cycles processing all of the data hence the additional latency in response time

      Only out of memory and running out of cycles will require a hardware change unless they go into the software and reduce the amount of data that needs to be processed

      • You are right, it’s a software error. But I think a very serious one. The flood of data strongly suggests that the FCC isn’t properly handling events.

        A good analogy is a Larry Neilson movie. The FCC ends up giving instructions to everyone and everything without ever properly addressing the consequences of the instructions.

        It is probably the reason why the pilots suffered bedlam on the flight deck. In other words, everything kicked off at the same time. Meltdown

      • Alan, I was thinking of causes similar to those you mention. I remember a situation on a large dynamic positioned ship. In (a critical) operation both computers failed within about 30-60 seconds. The operators were instructed o reset the computers prior to each ‘mission’, but only did this to the operator interface. Finally, (one of) memory, registers and GPS clocks ran full, and shut down the computer. The time difference was related to the previous reset; reset one, than the other.

        I believed FCCs were programmed by separate groups, but perhaps they used the same detailed code. And I take it for granted that the CPU hardware used in the two FCCs are from different production batches.

        Again, interesting thoughts, which I fully agree on.

      • Fast code is often fat code, so they can probably rewite the code to be more efficient, but this suggests the flight computer is approaching its limits for development. If I were an airline I would demand some assurances in the contract to avoid potentially hugely expensive avionics upgrades in the future

        • Its interesting that the new 777X is a further step up from the 777 when it comes to FMC and the software the pilots use.
          Boeing has completely upgraded the 777X hardware to have touch screens and all that implies. Theses are expected changes from software/hardware that dates from the early 90s.
          Yet the 737 Max had hardly any changes from the 737NG which dates from middle-late 90s and that again retained features of the earlier 737 series.
          Update considerably the newer plane but hardly change the very oldest model?

          • Biggest customer, SW, wanted the same cockpit/rating. No big changes possible.

  13. IMHO, at the root of the situation are missed aerodynamic tics, and possibly aerodynamic hysteresis. If my speculations happen to be correct, existing 737 MAX’s need VG’s, leading edge anti-stall strips, little deflectors, and similar features one can see on other aircraft. All have drag penalties so the 737 MAX economics greatly worsen and it will phase out quickly.

  14. The 737 MAX is not a safe plane!!!! bring on the NMA.

  15. Bjoern, please clarify this; first you state ‘- the FAA has found a flaw in the software Boeing ha developed to fix the MCAS problem’. Later you say ‘- not in the part of the code which handles MCAS’. To me this looks as contradictory statements, unless the new MCAS code is the root cause for what looks like a ‘CPU overflow’

    When I first read this item (elsewhere), I understood that FAA, as part of verification of the new MCAS functionality, carried out a thorough test of FCC functionality. Do we know whether the new item is related to AND or ANU functionality, – or both. And wouldn’t the CUT-OUT switches ‘assist’ in solving the problem.

    And isn’t it, so that the aircraft computer terminology is a bit tricky to keep a good oversight of. Sometimes functions are defined as ‘system’, – such as Speed Trim and Stall and Yaw functionality. It may all be in one hardware box, such as the Autopilot unit, containing several printed circuit boards, each containing several CPUs and ‘systems’, each doing specific tasks. Do you know how the B737 MAX flight electronics is configured?

    • the FAA introduced a new test case where they first disabled one of the 2 FCCs (which puts the FCS as a whole in a reduced capability state) then put the aircraft into a runaway stabilizer condition.

      with only one FCC operative, the computer could not keep up with the processing load, leading to higher than normal response latency to pilot commands.

      this indicates that they need to do some subset of the following: buy faster/more powerful FCCs, change functional priorities, degrade certain capabilities more aggressively in one FCC mode, optimize the software etc.

      • Thank you Bilbo, this clarifies the issue and why it was not discovered before.

      • No, the two FCCs don’t share the processing. But each FCC does have two CPUs or cores. Perhaps they shut one of the CPUs down leaving just one CPU to do the processing. It’s easy to do.

        A $1000 Intel CPU runs at more than a 1000 Mips. If that isn’t enough, god only knows what’s happening!

      • bilbo, can you explain what the FCC has to do with yoke trim switches? Elsewhere I read that they feed directly to the actuator. Also, since elevator authority is supposedly retained in the “fix,” why not grab trim wheel and cutout? All because of the catastrophic categorization?

      • The link I’ve provided below makes clear shutting one of the CPUs down doesn’t change anything. The speed trim logic is single channel but also single CPU. So it only runs on one CPU

        • Yes, but that CPU still has a higher load if the other is disabled. That would be why the problem arose when reduced to one CPU, but did not appear earlier with 2 CPUs.

          The FAA is now being very thorough in the face of intense criticism, and to establish their independence from Bowing. That’s a good thing, but my guess is if the same thoroughness were applied to other aircraft, you might find similar faults that are unlikely but possible.

  16. Q: Who decided that the non fbw 737 airframe is ok for a 68” (and then 70”) fan engine? Was any of the 737-300 aerodynamic engineers involved in that tragic decision?

    • Some of the aerodynamic engineers gave their honest opinion, which was ultimately overruled by second level managers and the 737 MAX program leadership. “We have airplanes to deliver”, they said.

  17. If this debacle continues, is there a possibility of BA building more -800s (and -900ers?) as low-value (to BA) stopgaps so their customers can fly *something*?

  18. 2 weeks of tests vs 8 months of development….
    At least, it should shut for a time the argument that the under-funded FAA cannot match OEM best engineers expertise, and the reason for the whole certification delegation …

  19. Moon of Alabama used to program similar microcontrollers. He suspects that they are intel 80286 family, he discusses the programming issues.

    “These old processors are very reliable and error free. But they have less than 1/1000nds of the computing capacity of a modern cell phone.”

    Pretty easy to imagine the additional overhead caused them to hang.

  20. At time, we will all have to admit the fact that the 737 is no fly-by-wire. Both FCC channels were not meant to be single fault tolerant.
    Fault remediation is through pilot flying handover and it takes time. More time than required to solve a mcas induced trim runaway, or the underlying instability it tries to solve. Or other issues with FCC that will pop up from now

  21. Is the 737 grandfathered to run on computer punchcards? Maybe they can try that.

  22. This link, if true, provides great insight

    It uses 80286 CPUs. Haven’t used those for more than 30 years. As the article makes clear they have 1/1000th the processing power of my mobile phone.

    The mind boggles. Basically we are being told that they are using a software architecture that is 40 years and more old. So I stand corrected. I’m not surprised the CPUs have run out of grunt.

    I am not flying on that airplane.

    • Well … i’m okay with them using 80286 processors. Porting the software to anything else needs a full requalification and decades of testing the software in real life would be thrown out of the window. It would be perhaps easier to write an 80286 purely in software running on a more modern CPU to run the flight control software than to port the software to ARM or x86-64 (modern x86 aren’t CISC processors like the 80286 but RISC processors with a translating layer, thus i don’t consider them as drop in replacements) for example.

      • Intel processors are CISC processors, which means they are not RISC processors. There is no translation layer.

        But anyway, 16 bit addressing meets a maximum of 64KBytes of memory. This must hold the operating system the FCC logic and the FCC data.

        So for example the FCC did not have the logic to understand that an AoA of 75° isn’t possible. It was that omission that caused the Ethiopian crash

        Boeing can’t fit in the logic and can’t fit in the data and there isn’t enough processing power to process it anyway.

        Not for me.

        Presumably you did read the errata descriptions because there are no critical faults in Intel chips. Minor inconveniences only.

        • Well … x86 processors look to the outside as they are CISC. The commands of the x86 ISA are translated into RISC-like micro-ops. The reason is simple: Most of the nice architectural tricks that make processor fast today are easier to implement and work better with RISC. This was introduced with Pentium 4 i think. So a 80286 is quite a different processor then a modern processor from the deepest layer of the technology.

          Furthermore the 80286 had a 24 bit physical address space. You can have up to 16 MB with it.

          By the way: Minor inconveniences in the errata? Those minor inconveniences range from really minor things up to bugs leading to a freezing cpu under strange conditions. There are many bugs only masked by workaround introduced by the BIOS. And then there is stuff like SPECTRE/Meltdown

          In order to to get back to the topic of this website: The problem is the FCC is a mature system. It has matured on the hardware plattform. I assume when you want to ask some of the developers, you have to use a shovel. Porting it somewhere else is quite an endeavour and you throw all the matureness out of the window and you have a new processor based on completely different foundation or a new operating system on a new ABI. This would take years.

          I don’t think that the bug will be overly hard to fix Boeing. Could be just a different prioritization, could be a optimized MCAS. We don’t know because we have no data, what this new bug is so speculation is not warranted.

          As someone who is self loading freight working in the area of server systems i prefer stuff in the avionics bay on extremely matured components and if it’s the body of code is matured together with the hardware, keep it together as long as you can.

          • I do admit there may be versions of the 80286 that do extend physical address space to 24 bits. I don’t think Intel did it. I think they moved to the 80386 and then the 80486 onto the Pentium. The Pentium was a true 32 bit processor. Can’t be bothered to look it up. So I accept you may be right

            The rest of what you say. No. There are billions of Intel processors running every possible kind of software without issue. Indeed I’m of the view that the 80286 is far less tested that the processors that superceded it. The reason, volume of use.

            You offer the argument of don’t fix something that isn’t broken. Boeing keep fixing it. But now it is broken.

            Forty year old CPU chips are not safer. Technology has moved on. The newer CPU chips are safer

          • By the way, quoting Spectre/Meltdown just shows how far some will go to blow smoke.

            Spectre/Meltdown requires rogue software. It’s far easier to install rogue software on 80286 based computers. Once their the software can do anything. The bouncing ball on a Window 2.0 computer is a simple example.

            Sorry, the smoke is not for me

          • Sigh. How should I express in the most polite way. Let me say it this way: The fundaments of your reasoning may need a reevaluation.

            80286 was 16 bit data bus, 24 bit address bus from the beginning. This was one of the main points behind the introduction of the CPU. It was one of the interesting features in the IBM Model 5170 or better known as Personal Computer/AT. Other manufacturers of the CPU like Harris accelerated it by giving it a high frequency, but some features where there from the beginning.

            From todays perspective the 80286 is a simple processor. And this is indeed a good thing. It is well understood and its behavior is well known. It’s manufactured on a rather coarse process node, thus the structure widths are quite wide which helps on the radiation side. Consider that this CPUs are used at great heights. For the newest CPU you simply don’t know how for example a Ryzen or a i7 is working over 30 years in such an environment.

            In this newer processors there is a lot of new technology. The structures are extremely fine, they have vast amounts of transistors where an 80286 doesn’t even have a feature, that you don’t use in such a situation.

            Aviation are frequently using ancient computing technology. Keep in mind that for example a A320 is running on 68000 and 80186. The 68000 was introduced in 1979, the 80186 is a higher integrated version of the 8086. And an A320 is doing full FBW on this old processors. The 777 is running on 486 to my knowledge and I’m pretty sure that they won’t change this for the 777X. Why? Because it works! Most systems are using ancient operating system from the general IT perspective. VXworks for example. Why? Because it works, it’s well understood and there are no surprises.

            Not only the certification of an aircraft is a product of the time where it was invented. Necessarily the computing infrastructure in those systems is such a product as well. You make a decision, you buy enough stock to get to the anticipated end of lifetime of your product when the „last round“ call is coming and you stay with it. Because that the system was designed and tested and the maturing of your product

            The comment about SPECTRE/Meltdown thing is not about blowing smoke. There is an important point behind it. Older CPU types are not susceptible to SPECTRE/Meltdown out of a simple reason. SPECTRE/Meltdown came out of some changes to the CPU to accelerate the CPU, whose impact wasn’t well understood. As I just explained understanding the behavior of the CPU in all situations is paramount when you are using it in safety critical situations. It’s an example of features that you don’t need that introduce errors. It’s an example that added complexity brings unknown and not well understood behavior. You don’t want this complexity in safety critical features because you don’t know when and how the complexities will haunt you.

            Processors are quite buggy nowadays. I don’t know your background, but you as a user or an application developer don’t see the problems, because others have already covered them. But ask the firmware or operating system people about processors and they will tell you a pretty different story about it, who have to ensure that certain conditions are never met when using the processors. I could tell you some nice stories from the trenches. And the list gets longer and longer.

            You’re comment about „many people are using this, so it’s well tested“ is only to a part correct. What many people are testing by using it are frequently used code paths. The code paths of Windows or Linux , using frequently used applications like Word, Apache or even some large databases with code generated with gcc or the Visual compilers are well tested. But as soon as you leave the frequently used paths the story looks differently. The amount of testing is vastly reduced. Think about the users of the Boeing flight control computer software of Boeing 737, there is exactly one use case of the software and the code paths in it: The Boeing 737. And this is where you need maturity of your systems working on your infrastructure for a long time. Because one one else is doing this job for you. You know the system is doing the right thing because you made the experience that your code is worked reasonably on the choose processor well for the last 30 years. And the fact that many others are using other processors for other applications or codes is perhaps an interesting information but with zero value to you as a designer of critical systems. You start from zero.

            I didn’t say „Don’t fix it if it’s not broken“ because this would be nonsensical anyway. You have nothing to fix if it’s not broken. However the two fundamental rules of safety critical systems are basically „You shall not introduce change into a known working system“ and „You shall introduce only minimal change if it got nescessary“.

            The 737 disaster is an example to it. The MAX necessitated change to a known working system and someway on this way, problems were introduced. The problem with the MAX shows why you change a running system because when you change it, the cards are shuffled anew and parts of your system are reset to a state of almost no maturity. Probably it would have been better for such an old system like the 737 to find aerodynamic ways to counter the problem in order not to touch the computers inside the aircraft. However out of whatever reason this was considered as impossible or undesirable.

            There is an old aphorism „If you hear hoofbeats think horses not zebras“. This helped me at over 20 years experience in designing and troubleshooting complex systems. At the moment we know almost nothing about the problem in the 737MAX and of course as soon as you hear there is a 80286 running, you think it’s one of the problems because it seems to be not powerful. However that is a zebra diagnosis. And think about the A320 running on less capable CPUs. A horse diagnosis would be that they have totally borked the priority handling, that there processes that totally hog the CPU under certain circumstances (and they would do this with a faster processors as well). But that’s all speculation and assumptions.

            My assumption for the future is that we won’t see use of general purpose chips in flight control features or really safety critical systems but reduced complexity processors based on industry standards. For example simple but sufficient ARM cores used with the minimal necessary surrounding circuitry to solve your problem in a given situation. Not a processor that is designed to run Fortnite or Grand Theft Auto. Fabless manufacturing and IP cores available for many processors technologies make this possible. But I may be incorrect on this. Predictions are hard, especially about the future.

          • The older processors have another advantage in terms of real-time control functions. The timing of events is slower but also more predictable. I remember when the 386 processor was introduced, there were some issues with control software because the processor might internally prioritize and change the timing of some events. So you gain speed but as Joerg says, the tricks used to do that may introduce some timing anomalies.

            I remember that some of our software had to be re-written to maintain its own internal clocks and event schedulers, on the 386. But the added speed of the 386 made that possible. So it was a trade-off. Still I could see that for avionics, using the older known predictable processor might be favored.

    • yes , 1/1000 of the ‘speed’ of a mobile phone, but have you seen all the software thats running in background on a modern phone, then theres all the apps loaded in , and finally the large full colour screen with touch functions, a processing hog.

      • And I’m sure you will keep adding Apps until it stops running. That what Boeing have done.

        As they say, you can’t put a quart into a pint pot!

        As another commentator said, NG accidents that have been attibuted to pilot error need to be re-examined. In other words, this isn’t just a MAX issue it’s an NG issue

  23. Since Boeing 737 Max was banned Thomson continue flying with one of the aircraft of higher performance in the aviation industry; not entering to discuss the arrogance of Boeing technicians and executives; Examples of daily flights that accumulate hours flown and experience: Thomson 576, 576, 2537, 1545, 2519 and 857. An obvious reading is the hidden intention to disqualify one of the manufacturers that more aircraft in its type has moved the population of the world; I mean the Boeng 737, 737 Classic, 737 NG and 737 Max.

  24. One scary side note to this whole affair is this company that created MCAS is bringing all its avionics in house!!

      • So after selling off Boeing Commercial Electronics ( BCE) in the 80s, now they want to get back into field

      • Most likely made by the likes of Collins or Sperry/Honeywell who would have made the autopilot. (IIRC original 737 was SP77, then SP177.)

        Hardware and software, to Boeing requirements and Boeing integration testing.

        I know that the orginal 737 had many accessory units made by Boeing itself, IIRC some containing cards from other makers (proximity switch interpretation for example), some cards made by Boeing or a direct subcontractor.

        The latter were counterfeited by Air Repair, one of its partners went to prison over that. Ruined a good business, to which Boeing people were referring customers for modification of one configuration to another. (I don’t know why a ‘personality module’ or such was not used in the airplane wiring, as done with a dongle chained to the rack of the 707 & 727 for the CADC. But units may not have been fully populated with cards so needed some added.)

    • Mark, interesting thoughts. First outsourcing and then insourcing, or; let us focus on the main subject, build/assemble aircraft – and let others build the ‘parts’ – something that they do better than us. Or, let us do everything and get all the earnings. The industry has through history, done a lot of the above. On top of this comes ‘let us broaden or product line’. Boeing, I believe, took the outsourcing farthest when they started with theB787. From then on, the practice was reversed.

      Typically, the B737 fuselage is outsourced to Spirit, that used to be Boeing company. I use to say that constructing and building all parts of an aircraft is WORK; whilst selecting parts, integrating, put together, and test the whole thing is ART.

      When aircraft got more and more complex, some manufacturers forgot the ARTists. Build a bit here and a bit there, can prove the proverb ‘a little hump can tip over a large load’ correct. I wonder what the workers in France thought when the Airbus units arrived from Germany and the cables was a couple of inches too short.

      Why, one may ask. The MAX story apart, I feel that we can take this back to the Boeing-McDonnel/Douglas merger. Boeing won the name and head office location competition; whilst McD/DC won most of the top management, and business style. In a way a merger of business people from Long Beach and engineers from Seattle.

      In my view it is correct to do the system integration in-house, with outside assistance when required; and let others (outsource if you like) the bits and pieces . Todays aircraft are very complex and development goes fast. Some focus is needed.

      • Re. to Svein,
        I really appreciate Bjorn’s article and ability to analyze the situation as both an aeronautical engineer and a pilot. And for your interesting comment, Svein. Your point about Long Beach, may have an additional angle.
        When the 737 was upgraded from 40” to 60” dia. engines, the aerodynamic analysis was done by the team from Seattle. The theoretical aerodynamic analysis of the MAX and the influence of its larger 68” dia. (and then almost 70”) engine was done by a new team, the one from southern California/Long Beach. To my knowledge, no one from that Seattle team participated in the theoretical aerodynamic analysis of the MAX. I will able happy to hear from anyone who knows more.

        • Hi Ran, as we search for more details, here is an interesting piece from this web-site

          You will find this story and more if you search the archives – this story is from July 21 in 2011.

          McDonnels introduction to civil aviation is interesting. First the company from St Louis ‘ took over’ the civil aviation manufacturer Douglas in Long Beach. Then the merger with Boing came up in the second half of the 1990-ties (and the rumors say: with the US government as marriage broker). Then, as the saying goes in Seattle, they bought Boeing with Boings money.

          Destroying one civil aviation manufacturer, and almost the second one, is a tragic story of its own.

          Read the link, and you will perhaps understand why things are what they are – enjoy.

          • Thats silly. For financial reasons a takeover can be structured that way, many US companies used a similar process with a small overseas company that ‘appeared’ to take over the much larger business in order to move the domicile of the business offshore.
            You seem to gloss over how badly run Boeing was at the time of the takeover. That was the time of the production line halt because everything was in a shambles, sales were made by offering massive discounts. ….

          • Svein, thanks for the link. I would like nothing more than to prevent another tragedy. Perhaps our conversation will be helpful in that regard. I worry that too much emphasis has been placed on the flight control software and more fundamental design shortcomings have been overlooked.
            It is good to learn from the link you provided that as early as 2011 Leeham News and the Seattle Times followed the development of the MAX.
            From Bjorn’s earlier article, see
            it seems that the top of the MAX nacelle is above the wing chord line. Perhaps higher than any other airplane? Could this be THE problem?

          • Hi Dukeofuri, thanks for comments. Beeing a reader/subscriber to Fortune and The Economist Magazines since (+/-) 1980, I remember many interesting articles related to (among others) the aviation industry.

            The Boeing/MCD merger created a lot of discussions, and to an extent controversy, in the new company. On top of everything Phil Condit ‘made a fool of himself’, and had to quit. His successor, Harry Stonecipher managed a few turnarounds in the company, before he as well had to quit (following a not to smart act). I agree to that a shake-up was needed at Boeing, but perhaps HS went a bit to far.

            Boeing famous for being an ‘engineers company’ ran into challenges when HS took over, an HS that tried to live up to his mentor, GE’s Jack Welch’s, business style. I remember HS once said; ‘if I ask an aerodynamics question at Boeing, I will have ten engineers at my office the same day; but if I ask a financial question, I will get one (wrong) answer a month later’. Perhaps a sidetrack, but worthwhile mentioning – it’s all about business/engineering cultures.

            But to doubt that Boeing lack the technology resources is wrong. Looking at all the technologies they are involved in, should ‘guarantee for that’. But with engineering offices at many locations, it is perhaps difficult to work together.

            When it comes to Boeing’s responses to the accidents, I feel that the U.S. compensation laws play an important role. The company weighs their words carefully. I feel certain that much more lawyers, that engineers, are involved in the aftermath of the two accidents. (Perhaps another sidetrack: search the web for ‘LA slip and fall attorneys’, and you see what I mean

      • Boeing en outsourcing since almost the beginni9ng – Hal Korry left Boeing to make things like annunciator lights, for years his company was the pre-eminent supplier of them.

      • Mike Sinnet is the only one I am surprised is not listed under Commercial Aircraft section. He was (perhaps formerly) the VP of Product Development & Strategy. Almost all others are still listed.

  25. from reading of the tea leaves, the new computer issue involves a non-MCAS system that could cause the plane to dive. The new FBW spoiler system, on the MAX, comes to mind. Maybe the new “elevator jam landing assist” or some other part of the spoiler system? Any thoughts?

    • The speed needs to come down to make the elevators operable. Deploying the spoilers will bring the speed down. But deploying the spoilers takes away the lift, thereby acting as an AND action.

  26. It would appear that the issues associated with the 737 MAX are as a result of design ‘mission creep’. Over the years the stabilisers increased in area without a corresponding increase in elevator area. This, coupled with a reduction a trim wheel diameters has made it physically very difficult if not impossible to manually adjust trim at higher speeds. The addition of larger engines to the 737 airframe necessitated their positioning higher and further forward than would be ideal and this has resulted in pitch up instability at high angles of attack. MCAS was introduced to provide an active anti stall system at extreme pitch up.

    Putting aside for the moment the merits of the above, it was absolutely essential that Boeing implemented MCAS competently. Instead, 346 lives lost provide clear evidence to the contrary.

    The problem for Boeing is that they look guilty of a coverup. The very wording of MCAS, it can be argued, is designed to throw any overseeing authority off the scent regarding flight instability and stall risk. To make MCAS related equipment ‘optional’ was again designed to minimise MCAS’ role. Pilots were not informed of its presence and it was excluded flight manuals At every turn Boeing has taken efforts to obfuscate the presence and role of MCAS.

    The latest MAX computer processor/software issue, appears indicative of yet again asking more of system than it was originally designed to provide. I am sure, given time and money, that these issues can be fixed by Boeing, however, that they occur at all suggests that the mission creep of tweaking old designs has limits.

  27. Some time ago I read one of Bjorns extremely interesting educational corners and I have some recollection that he stated that one of the advantages that the A320 enjoyed over the 737 was smaller vertical and horizontal stabilizers owing to its FBW system.I don’t suppose anyone remembers where to find it?I am wondering if this might be the real problem MCAS is intended to solve, misstrim problems occurring too early owing to the elevators being too small.
    It seems unlikely, because none of the expert’s have mentioned it as a possibility and I am definitely not an expert. It also be the opposite of what we have been told so far.If true it would presumably apply to the NG as well.What is the mysterious low speed handling problem that required the MCAS supercharger?

  28. I”ve been waiting for these details to surface … Is it “Boeing” really writing the software for the flight control computers. It certainly doesn’t remind me of the Boeing of yesteryear. The engineers who designed B-52’s, 747’s etc. It sounds like a few level’s of outsourcing, with the associated problems of coordination, communication and control.

    • Absolutely true. With a degraded FAA, the big corp. all the time will outsource to LCCs (Low Cost Country(s).) Did Boeing learn anything from the B787 LCC fiasco? At the now largest supplier to Airbus and Boeing, I turned to a co-worker and asked: What do those folks do in that cube? He answered they’re software people who fix code that comes back from India. About this time, a one level manager came by and I heard him say he was told by a three level manager to come up with more positions for LCCs.

  29. Since this is the 50th Anniversary of Apollo 11 it is interesting to remember that the computer on the lunar lander became overloaded on the descent and was throwing out error codes which had to be ignored for a successful landing.

    Some things never change.

  30. Trying to hide safety problems was ordinary BA politics already 30 years ago. On 20th of Mai this year Niki Lauda died. If Lauda Air 004 was comparable to LA only his vigorous, energetic and selfless work forced BA to admit that the problem were the valves of the reverse thrust system. So no second crash had to happen.
    “Lauda asked for a press conference the following day, and told Boeing that if it was possible to recover, he would be willing to fly on a 767 with two pilots and have the thrust reverser deploy in air. Boeing told Lauda that it was not possible, so he asked Boeing to issue a statement saying that it would not be survivable, and Boeing issued it. Lauda then added, “this was the first time in eight months that it had been made clear that the manufacturer [Boeing] was at fault and not the operator of the aeroplane [or Pratt and Whitney].”
    Lauda for sure was a one in a million personality whith high engineering knowledge. He is far to seldom remembered for his commitment to aviation safety. But to me this is just a mirror from the past showing nothing has changed. RIP

  31. Spare a moment for airlines like Southwest and Ryanair who’s future is heavily dependent on the 737MAX and the pilots who have to fly it. You can do all kinds of software algorithms but until hardware problems are fixed it will inherently be a (“fatally”) flawed design.

    Even if these airlines want to add Airbus aircraft the huge backlog it won’t be possible?!

    Mr.Walsh decision to consider the 737MAX carries a big question mark but, should airlines with big fleets (300+ single aisles) be dependent on one OEM for their aircraft?

    • The Airlines are 100% to blame for this predicament. The could have placed large orders for the excellent Cseries, and now they can only buy it as the A220. You make your bed you sleep in it. I guess it will be Comac or MS-21?

      • Who do the leaders of LUV play golf with? The vast majority of their fleet is 737-700s. Over 500 planes. Granted there is substantial up-sizing to MAX8s, but the closest in size to that plane, the A220-300, cannot be used. These leaders have made decisions. These leaders have visions. Visions of flying MAX8s over vast bodies of waters.

  32. At last the FAA is now doing their job and not just taking Boeing’s word so are now actually testing safety and failure related scenarios. It was reported Boeing had seen and rated the found issue TWO levels below catastrophic. One might? accept some difference of perhaps one level (which should have been found and challenged long before now), but two? It would seem the self-certification regime agreed by the FAA with Boeing will need to terminate as the manufacturer seems now proven incapable of realistically evaluating its own product. Surely this should have been foreseen and when the FAA gets its additional funding and resources we can all hope for better safer airplanes in future. We do need Boeing to be making fine flying machines, but not ones that pilots simply cannot control under some circumstances, whatever they are. To have allowed their engineers and software specialists to produce a flawed airplane will now cost Boeing the time they should have put into making the 737 MAX properly in the first place. Deservedly so; nothing good was ever made in haste. Boeing seemed to have forgotten that. Even led by someone of significant Engineering background (so they don’t even have the Accountants to blame). Some more heads need to roll, surely.

  33. Joerg,

    Sign indeed!

    In the end, I did look it up. It’s 20 bit (1 MByte) in real time mode and 24 bits (16 MBytes) in non-real time mode.

    My comment that 16 bit CPUs can have 24 bit memory addressing came from my background knowledge. Specifically, I’m aware of memory segmentation techniques.

    To help you, I studied aeronautical engineering and then entered a career in Computer Aided Design (CAD). I didn’t use CAD systems, I designed and developed CAD systems. I’m now retired.

    Sorry, not much you can teach me about computers.

    I didn’t have specific knowledge of the 80386. So I took the article that I gave the link to as gospel. I’ve now looked it up. It is 16 bit addressing but uses memory segmentation to provide extended memory addressing. Memory segmentation takes up CPU cycles, lots of them.

    It’s all irrelevant. 1 Mbtye or 16 Mbytes. Useless for what is necessary from an engineering point of view. Take it from an engineer.

    It does mean I now understand what’s been happening. Boeing have been ‘trading’ engineering logic because of CPU limitations. I do have to admit, I’ve done that, but nothing critical or serious. I hope.

    I gave a simple example. There is no logic to identify an AoA of 75° as invalid. In the Ethiopian crash, the AoA registered 75°. From an engineering point of view, an AoA of 75° is invalid. If there had been logic, then MCAS would not have engaged and everybody would be alive.

    To generalise, the FCC doesn’t appear to have floor (minimum) and ceiling (maximum) logic. To do that requires detailed charts. The CPU doesn’t have the memory for such detailed charts, even with 16 Mbytes. So the logic was traded out for other ‘more important’ logic.

    This is what has been happening. Boeing engineers have been refused the right to upgrade the CPUs in the FCCs. So what goes into the FCC is a trade-off for there isn’t much room and the processing power is very, very limited.

    With regard to the current issue, Boeing referred to it as major. The FAA referred to it as catastrophic. So Boeing traded the problem out, the FAA have traded it back in. How Boeing will fit it in, I’ve no idea. Boeing will have been optimising the code for decades to fit within the CPU limitations. There is probably no optimisation left.

    The big question for me is what else has been traded out because it would not fit within the limitations of the CPUs? From the current reports into both crashes, I can think of a lot. An aweful lot.

    Returning to stability. The root cause of all this is that the 737 MAX does have stability issues. That doesn’t mean the airplane is unstable, it means it is relatively easy to make the airplane unstable. Hence, MCAS.

    Anyway, computers. They are safer nowadays, not less safe. But, I’m an engineer. So I look at it that way. Specifically, engineering logic takes precedence over everything else, for if it’s not there it won’t work, regardless of what is said about the computer.

    Boeing, a lot of engineering logic just isn’t there. Sigh indeed.

    With regard to the faux pas with respect to the memory size on the 80386. I should have looked it up as opposed to trusting the link. But it’s irrelevant. It’s still useless.

    • It’s also a 80286 not a 80386. Opps. But, I did look up the right one.

    • Philip, isn’t the stability issue related to the aircraft being at low weight (almost empty), with the CG at, or close to, the aft limit. In this situation the the stability becomes too low (CG being too close to CP/CL). Then at high AOA, the oncoming ‘wind’ will hit the fuselage and engine nacelles, and flip the feather like aircraft over (to be a bit theatrical).

      Since MCAS is active with flaps up only, one may assume that flaps down, increase the above mentioned margin enough to make the aircraft ‘safe’.

      If the above assumptions are correct, the next question is: why isn’t MCAS ON only when required? Which seems to be in ferrying operations.

  34. Philip, you’re drawing a lot of conclusions here. We don’t know where the avionics are in terms of maximum events loading. I don’t know, but would assume, that the FAA looks at processor utilization and would flag a system that was running at its limits. That would also show up in testing, as it did in this case, and probably in actual flight as well. There have been no reports of that. This was a test case that forced the FCC into a lower-capability state. It’s valid and needs to be addressed, also addressing it will probably mean a complete FMEA re-test in the simulator, as it impacts event handling. That will take time for sure.

    We also don’t know that Boeing intentionally swapped out limits testing for MCAS because the FCC couldn’t handle it. Obviously limits testing is in place for other things. It’s in place for MCAS too now, but whatever was done to the software will need to be economized to use less resources.

    If you are right, that task will be impossible, so we’ll have to see what happens. The proof will be in whether it can pass the FMEA.

    • No we don’t know everything … indeed, almost nothing because Boeing won’t tell the truth,

      What we do know, the FCC took the view that a 75° AoA was valid and crashed the airplane. I’d of been thrown out of university for that.

      What else don’t we know? Lots

    • But where the heck was the FMEA that tested the idea that data from a single AOA sensor could be corrupt?

      That test seems never to have been performed….

  35. As a passenger: At this point I would feel safer flying in the MAX if they pulled the MCAS system altogether and just trained the pilots in a simulator, on the aircraft’s tendency to pitch up under certain conditions. Analysts seem to agree that MCAS is only needed to avoid expensive simulator time and permit training by tablet, but that the aircraft is not inherently unstable or not airworthy otherwise. Would that get them back into service more quickly?

  36. Does checking sensor input data for plausibility really need much computing power? Or the other way around: A flight control system that is too stupid to flag obviously wrong sensor data does not sound to me like something that should be accepted in the first place.

    I understand that flight control software and regular code as it is written for smart phones or computers can not be compared. But given the cost and amount of people supposedly involved in this type of software/system, the result appears to be shockingly low tech.

    • Ralf, I believe the upgraded MCAS software will look at the analogue AOA values, and ‘filter out’ values that are obviously wrong, i.e. being off-scale. Another possible filter is to look at rate of change, i.e. values that may be logical, but change value to fast. The latter is not being implemented (as I understand). The filtering could be done before the FCC, for example in the ADIRU. Then the FCC work could be limited to work with real values, and receive just a valid/not-valid bit ( in addition to the analogue value when valid).

      It is a bit wrong to compare with cellphone software. My cellphone is getting application updates every day, some are fixes and others are new feature. This methodology want work with an FCC I agree with you that the whole FCC upgrade stuff seems like and endlessl project. I would guess the the engineers-programmers involved in the work had all the knowledge needed. But as I said in another comment; – if we have had as many engineers on the project, as we have lawyers, then the whole thing would be ‘a piece of cake’.

      • While 75 degree AoA is off any scale filtering out wrong data is easier said than done. The AF447 A330 lost in the mid Atlantic was partly caused by a computer system which said that no sane person will fly the aircraft like this, these numbers must be wrong. If MAX’s processors are on the limit, something similar could happen. As another poster has pointed out, including the necesary performance graph might be beyond the processor.

        • Martin, the AF447 accident started with the aircraft flying into bad weather at high altitudes near equator (when the others flew around the ‘dangerous clods). Then (some) pitot tubes froze; airspeed was lost and the autopilot disconnected. Then the airspeed came back, systems worked as they should,but the pilot kept pulling the sidestick all aft. The aircraft ‘descended’ all the way to the ocean. The pilots were not trained in – had little knowledge of – ‘high altitude stalls’.

          Keep in mind that the stickshaker starts to function when the weight-on-wheels switch shows ‘flying’. On the two Lion Air flights the left AOA showed 22+ more than the right side even during taxing and takeoff roll. A ,too high difference’ warning could have been given (long) before V1.

          On the EA flight both AOAs showed normal values during takeoff roll, but the left one went off scale at rotation. Could easily be detected as an error, and sensor ‘disconnected’ (too fast change of value’).

          When one sensor shows wrong values and all others show normal values, it should be easy, for the system, to find and ‘filter out’ the bad one (using attitude, air speed, vertical speed, the INS). Already implemented on some aircrafts. And I have not forgotten that an erroneous AOA sensor will influence the IAS and ALT readings.

  37. I wonder if anyone at the FAA knows why MCAS was increased in power, and activation situations, during flight testing in 2016? Do they know what the “less predictable” handling characteristics are that Boeing test pilots experienced near the stall, that precipitated the increase in MCAS software? Is it just Angle of Attack that MCAS uses as input? Is it just the stabilizer that is changed, or is the yaw damper stall system triggered (as evidenced by the stick shaker going off) also? During FAA testing, did they just observe test flights, or did they actually review the MCAS code (which may have shown only one AOA as input etc). Are they now reviewing any of the actual MCAS computer code, or only the physical and simulator tests? Without being able to review the actual computer code, you’d have to do a lot of “black box” simulated testing to figure all of the input/output combinations. I’m beginning to doubt that anyone in the FAA actually has the actual FCC computer code.

  38. Boeing has it’s head in the sand. They want to fit a square peg, in a round hole using just software.
    They wanted to build an A320neo, with it’s big, new, fuel efficient engines, and squeeze them under a 737 quickly, and cheaply.
    But, Airbus is light years ahead of Boeing in designing flight control systems.
    Boeing 737 MAX designers think One AOA sensor is enough where Airbus, and most every other Boeing deisgn, now uses Three.
    Boeing thinks it can just hang a jet engine a bit more up and in front of the wing with no concequences.
    but, their test pilots say it won’t pass inspection, so they ‘patch’ up a speed trim system to hide the tricky stall entry problem, calling it an augmentation system,
    rather than vortex generators or stall strips, that their test pilots suggest because that would limit fuel efficency and raise costs and take time.
    Then, their software ‘patch’ actually contributes to two air crashes, with their great one AOA sensor design, which had already has been redesigned during
    testing, to be more powerful, but, someone forgot to put any limits to it’s activation, so it can push a planes nose down, every Ten seconds, forever.
    But, why bother the pilots with minor details like that. Just put an AOA disagree light into the cockpit for a small extra charge and we’ll think about
    updating the software to make it work sometime later, in a future software release.
    There’s no OFF switch for the pilots to turn off the poorly designed MCAS system, so they have to ride the plane down into the ground.
    There’s no way to manually trim out of the situation, if it gets too far gone, because the trim wheel has been shortened. Theres no way to electrically trim out
    of the situation, without the MCAS software activating, because Boeing redesigned the Two stab trim cutout switches to oeprate as One.
    Is Boeing fixing the manual trim wheel problem? Rewiring the stab trim switiches back to Two? Putting vortex generators on the aircraft to fix the tricky stall?
    NO, they are trying to patch software to make all these problems go away. They’ve had such good luck with that in the past. It’s quick, it’s cheap and they don’t
    have to show any redesigns to their FAA DERs. Just fly a few flight tests, run few simulator examples, and show the pilots a few words about MCAS. I’m perfectly
    satisfied. Of course, first they have to upgrade the CPU, because it’s failing in the FAA simulator tests but, they can probably get around that using a cheap
    and quick software patch.

    • If MCAS being linked to one AOA sensor & the pilot pulling a circuit breaker the redundancy, meeting grandfathered 737 certification requirements?

    • Head in sand ? Nope -collectively it is where the sun dont shine and a physical impossiblility.

      BTW 787 ( and maybe 777 ) do not use 3 AOA sensors AFIK

      At least the 787 uses a separate system ( inertial ) as a backup- compare of dual sensors like pitot, alt, pitch? ,etc

      Tom Dott ISASI Sept 2011 ” introducing the 787 ”

      – look at pages 38 and 42 re INS compare to normal sensors

      Seems to this SLF such a additional black box could be retrofitted to all NG and MAX ( not cheap but IMHO a better solution and costing only a few billion )

      And of course still have to figure out how to handle the near useless full manual trim wheel when ALL the HAL controlled goodies are turned off and or default to leaving ONE pilot in charge as required/practiced for a few decades. Said pilot not required to be a clone of Hoover, Yeager, Tex Johnson or Ernie Gann. This so the unknown Genie (named MCAS ) does not unzip fly and urinate on the current pillars of management !

    • I just noticed something that I missed before.

      From Wikipedia on the 787:

      “With the same wing but a longer fuselage than the -9, the flutter margin was reduced for the -10 but to avoid stiffening the wing or adding wingtip counterweights for commonality, software oscillates the elevators in the flaps up vertical mode suppression system (F0VMS), similar to the vertical gust load alleviation system.”

      See a trend here ? Something that would/should have been fixed using engineering, and aerodynamics is now ‘solved’ using software.

      Given what we’re now beginning to understand about the software that had been implemented on the 737-MAX, I do hope that ALL the boundary conditions have been thoroughly tested with this system on the 787-10 !

  39. Bubba,
    I relied on a posting for the bit about all other Boeing Aircraft having 3 AOA sensors ..
    BTW, All Boeing airplanes other than the 737 use triple-channel systems (L-C-R) (747,757,767,777,787) with three independent sets of sensors.
    The 737-100/-200 only had one AoA vane.
    It sounds like the Boeing 777/787 uses Two AOA’s and One inertial reference to feed the Three independent systems, not Three AOA’s.
    What do I know? I’m lucky if I see a piece of string hanging out the window (grin). (I used to fly small aircraft)
    As for the trim wheel, why not some sort of mechanical advantage, if needed. A simple geared crank to engauge the trim wheel out of it’s hard to turn region.
    I know, when seconds count, this isn’t much, but, it’s something. Or reconnecting the stab trim cutout switches as they were on the previous versions of the 737?
    This is a real head scratcher, as to why Boeing made a change on the stab trim cutout switches, along with the yoke override switch. It’s almost as if they
    want the pilots trapped with no options. I haven’t heard a good reason for the wiring change. Or, why it’s not being retrofitted back to the orginal NG.
    Current draw per switch? Boeing is standing on it’s head, not to allow any hardware changes at all. I think that’s a mistake, long term, with 5000 orders lined up.
    I’d rather it be done right, rather than software ‘patched’.

    • Umm, the IRS would have to calculate AOA from parameters including attitude and vertical speed.

      AOA is angle of _airflow_ over the wing.

      Perhaps a good approach if accurate, as it uses different sensor (attitude and air data).

      Beware that AOA may be adjusted by computers to get better accuracy if the vane itself does not accurately reflect wing AOA. (Siting of the vane could be a challenge.)

      Note that stall (which is what MCAS is intended to prevent by helping the pilot avoid over-controlling as control forces lighten from the effect of the large high nacelles on airflow over the wing) depends in substantial part on wing flap deflection.

      Interesting that the original 737s had only one AOA vane, never thought about that when working around the airplanes. But it was not a Cat III airplane.

  40. In the final seconds the ET 302 pilots turned the stab trim switches back on and used electric trim to pitch up. But it didn’t work. Why? Was the FCC overwhelmed at that excessive speed (340 knots)? Or perhaps both pilots tried to trim at the same time and that overloaded the CPU? Really can’t understand why electric trim failed to get the nose up on the last attempt the pilots made.

    • Sid, are you sure? According to the Preliminary Report, text part, the following took place at 05.43.11; – ‘two momentary manual electric trim inputs are recorded in ANU direction. The Hstab moved (up) from 2.1 to 2.3 units’. This took place shortly before the end of the recordings.

      So it moved. But the question is clearly ‘why didn’t they hold the trom switch for a longer period?’ From the graph in the appendix is looks like they gave

    • Thank you for the link Richard.

      I agree he makes some very salient points.

      If I was in charge of hiring programmers to write software to control a FBW aircraft, I would be looking for people with current PPL, instrument rated minimum.

      A pilot would recognise that a 75 degree AOA would not be a sensible input to a command. They’d also be very wary of allowing the stabiliser to move to full nose down, or full nose up.

      Of course you have to also make sure that middle management are not allowed to overrule the programmers. You hire, and pay for experts for a reason.

      • I think the programmer analyst on this assignment was heads down looking at his GPS rather than watching the mountain getting larger, in the windshield. A lot of programming focus after the Enron scandal has been to get the documentation right and who cares if the program works or not. It will be really interesting to see the chain of events as to how MCAS got into production. It may be a not enough time to do it right the first time, but, always enough time to fix it afterwards. Boeing was pushing to get this plane out the door quickly, and they own the FAA. I can’t wait for the books that are sure to be written on this fiasco. It may be that the coders on this were hired guns, for just this assignment, and the programmer analysts were not well versed in aircraft flight controls. But, it appears that very few people had reviewed MCAS, especially from a flowchart logic perspective.

        • Certainly have to pay attention and understand what one is doing, despite the various processes.

          Much knowledge about, though some experts are naïve, I just came across a book titled something like TQM for software but haven’t had time to look at the TOC. (Twenty cents in a government surplus store.)

          TQM is a vague term, Total Quality Management. My take of a fancy Honeywell video about whatever they thought TQM was is that the much vaunted 6-sigma process was just empowering employees – managers had to listen and discuss, thus good ideas were implemented and rejections explained. (Sometimes ideas don’t fit the big picture including impact on other employees. Of course in a quality company deadwood had been fired – some Boeing 787 people come to mind, and dishonest people definitely fired – some Boeing and supplier people on the 787 program should have been.)

          BTW, the term ‘Lazy B’ in my mind comes from observations of some people decades ago, sitting at their drawing board, not hustling to get work done.

      • I wouldn’t go quite that far, some engineers are good at learning from pilots. Boeing flight deck people were good decades ago.

        You also want people who have done safety analysis

        And there are excellent programmers, people who have experience and perspective and care. Unlike the reputed twit at a cocktail party who said it would be a great job without the users.

        You want people who understand how to design software for reliability. There are guidelines such as DO-178x.

        Key is that both programmers and managers listen to each other, starting of course with sound values. One my friends was praised by an intern for instructing him to find the root cause of a problem, not just patch the problem they knew about.

        And checking is good. I once found an error in navigation software written by an experienced programmer who had a pilot license. In testing I found it turned the airplane the wrong way if a parameter in the route was of a certain value. Apparently the software had not been thoroughly tested before, I was testing an update. Embarassment all around.

  41. Can anyone clearly explain whether (simulated) hardware fault caused only nose pitch down or is behind CPU overload as well? Because there are two distinct interpretation of what Bjorn has written:

    1) FAA tests resilience to hardware faults, induced fault pitches nose down, AND from that pitch down is difficult to recover due to overload, since the same hardware faults overloads the (other, working) CPU

    2) FAA tests resilience to hardware faults, induced fault pitches nose down, AND from such pitch down is difficult to recover due to CPU overload, regardless of the presence of the hardware fault.

    I cannot see how scenario 1) would be possible in a fully (properly) redundant system. It is possible only if computers are not fully redundant and have some shared tasks.

  42. Thanks Bjorn, but reports are too terse to understand what function is involved, AvWeb has a different description from somewhere.

    Some people have been claiming there is a ‘switch’ at the bottom of each control column that is activated when the column is pulled hard aft. I know there was a force sensor there that is probably used in normal operation for the ‘Control Wheel Steering’ function, at least on the original 737s (-100/-200).

    Some people are claiming that ‘switch’ at the bottom of the column does not stop stabilizer movement on the MAX whereas it does on earlier models.

    I’ve _speculated_ that the reported processing delay is the cause of that, but if of force sensor data it may show up in normal operation (I can’t think of what other data would be processed even in an emergency situation).

    People are using ‘switch’ too loosely, even Peter Lemme is not clear.

      • Terminology confusion continues. Sigh.

        The ‘cutout’ switches most people refer to are on what is referred to as aisle stand/pedestal.. behind the throttles.

        I don’t know what the switch or equivalent force sensor interpretation said to be on the bottom of the control column is called.

        (The normal trim command switches are of course on the control wheels on top of the control column, some people don’t distinguish that from the bottom.)

        • Keith,
          On this web page

          look for the diagram marked
          “Boeing 737NG Stabilizer Control” (about the 10th diagram)
          it has “2 column cutout switches” ..”2 Stab trim cutout switches” (the ones referred to in the emergency trim runaway checklist) .. and a switch box marked STAB TRIM OVERRIDE .. that’s the one that puzzles me . is this only on the NG model and not the MAX? What does it do? from the text it mentions “Stab trim override provides a means to override the cutout function (if it failed and prevented electric trim operation).” .. so if this override switch is somewhere on the MAX … is MCAS tied into to it or not?

          • Thankyou.

            I’ll study that more offline.

            Note it is near an audio selector panel, perhaps the one for the second observer’s seat which may be on the overhead panel, IIRC the first observer seat on the original 737 was to the right of the seat in the side wall. But that illustration shows knobs of another one to the left, perhaps the switch it is on the aisle stand (with an audio selector panel for each -pilot forward of it).

            I see a small rudder trim knob, I recall a big knob on the back of the aisle stand of the original 737.

            Observe the confusion of terminology, ‘cutout’ for both the switches on the bottom of the control column and for the lever-lock switches on the aisle stand behind the throttles?

  43. If you take a 737-MAX up to altitude, disable MCAS, and pitch up to stick shaker, can you recover using only elevator, throttle and the yoke trim switch?
    I’m basically laying out a stall recovery test for certification (without MCAS).
    If unable to recover, properly for certification, would there then be, reluctance by Boeing to shutting off MCAS, since it would be a required item on the aircraft,
    as it seems to be currently?
    According to the Aviation Week and Space Technology article, the FAA simulation flight testers want a faster stabliliter response by the trim yoke switch?
    As Boeing also, originally modified MCAS to a faster rate of trim.
    Does this indicate that the 737-MAX elevator doesn’t have enough authority, by itself, in a stall entry situation? That it requires quick stablilizer response,
    also, in order to recover from a stall in a 737-MAX?

    BTW, why does MCAS fire for 10 seconds, and then lie in wait for 5 seconds, before attacking again? Why not just fire, until the AOA is back to ‘normal’?

  44. Just posted in PPRuNE thread on switches on NG:

    “Accuracy please.

    Precise terminology is part of what is necessary to achieve that.

    Obviously there are stabilizer trim command switches on each control wheel, which is at the _top_ of the control column.

    Obviously there are stabilizer trim cutout switches on the aisle/stand/pedestal/whatever, behind the throttles. (‘Override’ would not be the best term for them, as they disable stabilizer electric trim.)

    And of course the manual trim wheels beside the throttles, each having a handle that flips out from a detent. Each with a white stripe so help detect movement of them, offset in angle from each other to increase likelihood of seeing movement.

    The question is whether or not there is a function resulting from something at the _bottom_ of each control column, that stops stabilizer trimming when the column is pulled hard back. (People refer to a ‘switch’, it could conceivably be interpretation by a computer of the force sensor that is there.) Hidden from pilots.

    _Perhaps_ the slow processing discovered by the FAA involves that function.

    And in this PPruNE thread there is suggestion of a switch that crew can use to override an automatic over-ride performed by an FCC in certain conditions.

    (Plus now there’s a claim out of Europe that the autopilot doesn’t always ‘properly disengage’ when commanded. I presume command is by a switch on the AP control panel on the glareshield.

    Read AvWeb and Bloomberg”

Leave a Reply

Your email address will not be published. Required fields are marked *