February 19, 2018

Software in dangerous places

Software increasingly manages the world around us, in subtle ways that are often hard to see. Software helps fly our airplanes (in some cases, particularly military fighter aircraft, software is the only thing keeping them in the air). Software manages our cars (fuel/air mixture, among other things). Software manages our electrical grid. And, closer to home for me, software runs our voting machines and manages our elections.

Sunday’s NY Times Magazine has an extended piece about faulty radiation delivery for cancer treatment. The article details two particular fault modes: procedural screwups and software bugs.

The procedural screwups (e.g., treating a patient with stomach cancer with a radiation plan intended for somebody else’s breast cancer) are heartbreaking because they’re something that could be completely eliminated through fairly simple mechanisms. How about putting barcodes on patient armbands that are read by the radiation machine? “Oops, you’re patient #103 and this radiation plan is loaded for patent #319.”

The software bugs are another matter entirely. Supposedly, medical device manufacturers, and software correctness people, have all been thoroughly indoctrinated in the history of Therac-25, a radiation machine from the mid-80’s whose poor software engineering (and user interface design) directly led to several deaths. This article seems to indicate that those lessons were never properly absorbed.

What’s perhaps even more disturbing is that nobody seems to have been deeply bothered when the radiation planning software crashed on them! Did it save their work? Maybe you should double check? Ultimately, the radiation machine just does what it’s told, and the software than plans out the precise dosing pattern is responsible for getting it right. Well, if that software is unreliable (which the article clearly indicates), you shouldn’t use it again until it’s fixed!

What I’d like to know more about, and which the article didn’t discuss at all, is what engineering processes, third-party review processes, and certification processes were used. If there’s anything we’ve learned about voting systems, it’s that the federal and state certification processes were not up to the task of identifying security vulnerabilities, and that the vendors had demonstrably never intended their software to resist the sorts of the attacks that you would expect on an election system. Instead, we’re told that we can rely on poll workers following procedures correctly. Which, of course, is exactly what the article indicates is standard practice for these medical devices. We’re relying on the device operators to do the right thing, even when the software is crashing on them, and that’s clearly inappropriate.

Writing “correct” software, and further ensuring that it’s usable, is a daunting problem. In the voting case, we can at least come up with procedures based on auditing paper ballots, or using various cryptographic techniques, that allow us to detect and correct flaws in the software (although getting such procedures adopted is a daunting problem in its own right, but that’s a story for another day). In the aviation case, which I admit to not knowing much about, I do know they put in sanity-checking software, that will detect when the the more detailed algorithms are asking for something insane and will override it. For medical devices like radiation machines, we clearly need a similar combination of mechanisms, both to ensure that operators don’t make avoidable mistakes, and to ensure that the software they’re using is engineered properly.

Comments

  1. The medical device field is wholeheartedly embracing static source code analysis, runtime analysis, and other pieces of software to help them find their bugs. While the concerns are valid, the clients at the place I work at least are trying to do something about them.

    • Syd, I’d appreciate if you could comment on the degree to which this level of discipline is self-imposed or whether it’s being pushed for, and checked by, external agencies involved in certifying or testing the machines.

    • At the beginning every hospital forbidden to uses cellphones.
      The devices could disturb the electric and distort the outcomes plus the life-threatening patients. but this was! now I am to be delighted that we are addicted on medical software.

  2. Glenn Kramer says:

    Dan, At least in the aircraft space there are good reasons for being wary of “sanity checking,” or at least to allow for the insane. Airbus, for example, is a big fan of “hard” flight envelope controls. But insanity is a better options sometimes: a pilot may want to climb at a rate that stresses the aircraft rather than face the certain death of slamming into an upcoming mountain. If the plane’s control software doesn’t allow an override, it’s game over. Boeing likes “soft” flight envelopes with the ability of the pilot to override. This isn’t an abstract example. Check out China Airlines flight 006 which describes how pilots recovered from a sudden uncontrolled roll/dive, in the process deflecting the wings permanently and losing chuncks of the horizontal stabilizer. Better than having the flight computer say, “I’m sorry Dave, I can’t allow you to stress the plane.” There are also numerous comp.risks postings about fly-by-wire.

    • I think that the ticking-time-bomb torture arguments have made me a bit allergic to arguments from scary rare cases.

      For example, the article you reference describes three pilot errors (failure to descend before restart, restart attempt with autopilot engaged, failure to use rudder after disconnecting autopilot). It was not necessary to deform the plane — the article claims that the correction occurred within 500 (vertical) meters of the pilots orienting themselves, leaving 2900 meters between them and the ground. Half the acceleration would require quadruple the distance (as I understand it), still leaving 900 meters of clearance to the ground.

      What’s not clear is where the lines are drawn for FBW control — is it up to FBW to follow the restart protocol? I’d hope that a decent autopilot/FBW system would prevent at least one of the errors that set this near-crash in motion.

  3. Khürt L Williams says:

    I’ve also read reports of people driving into swamps because the GPS “told” them to. Perhaps we need to re-examine out blind fatith in machines.

    • Flamsmark says:

      Those stories make me think that we should re-examine our blind faith in people. The navigation system is doing exactly what’s intended, with all the information available to it. The driver is not: they can see the swamp, but ignore it anyway.

  4. Martin CT says:

    The article suggests that the worst failures are really gross. E.g. running with all beam control filters removed. You really need an external, non-software (out of band) check procedure. Why not put an X-Ray dosimeter on the patient, wired to sound an alarm? Or, add an optical verification that makes the radiation target area immediately visible to staff?

    Good software design and testing are obviously important. (Perhaps Windows and other unsafe OS’s should be banned?) But in safety-critical situations, you need independent checks and verifications, assuming that hardware and software can fail.

    • Bryan Feir says:

      That was one of the issues with the aforementioned Therac-25, which we studied back when I was an engineering student at the University of Waterloo. (I took a Systems Design course on reliability and human error.)

      The previous system had hardware interlocks that physically prevented the electron beam from being used at full power unless the ‘target’ was in place to convert the beam to X-rays and spread them out. The people who built the Therac-25 felt that the software interlock was good enough, and so removed the hardware interlocks, presumably to save costs. They then, however, used mostly the same code from the previous system, which contained a race condition bug that had been masked by the hardware interlocks.

      People died as a result.

  5. Neil Prestemon says:

    In school, I was taught that avionics software is written by three separate (isolated) teams, in three separate languages – and the three systems “vote” on how to respond to sensor inputs, and the odd-man is rejected.

    We also studied the Therac-25 case in IT-ethics class. Ethics classes don’t often teach you the “right” answers. They teach the students how to think about ethics though. It’s not always as simple as cutting corners, sometimes perfectly ethical managers will make calls like that, without the necessary technical insight, while counting on honest feedback from engineers – who remain silent, or may speak up, but perhaps not forcefully enough to trigger a management decision. (perhaps out of fear of being fired, or being perceived as “not a team player” – etc.)

    I also have some career experience with range-safety software (ie. rocket launches) – which is a completely different animal; the procedure being rigorously controlled; a “likely” flight path is simulated in software, based on past performance data from that particular vehicle flown in that configuration, test flights, real flights, etc. Both telemetry, and captured radar. It’s pretty simple here – if the radar sees the vehicle stray past the safety area, the Range Safety Officer hits the button, sending the destruct command, destroying the vehicle. The old-timers have many “horror stories” about early missile tests in the 40’s and 50’s that went off-course.

    So – I recently underwent LASIK surgery. Who hasn’t heard horror-stories? I asked my doctor as many questions as I could think about how the system worked. He answered them, and he had a lot of confidence in the software’s ability to correctly model a perfect refractive solution. The “rare cases” occur when people heal differently, or something happens post-op (patient gets an eye infection, allergic reaction, rubs their eye and tears the corneal flap, etc.) – While LASIK is a proprietary system. . . we can’t examine the math, or how the beam is controlled – they do admit that the beam can only monitor whether the patient moves (sneezes, flinches, whatever) during the procedure, and then shut-off. (and pick-up where it left-off when the operator confirms the patient is stable again). It can’t “know” if part of the cornea vaporizes faster than their model predicts, and therefore, that’s one of the “unknowns” that can’t be accounted for that can affect the outcome (I guess). But does this happen? Of the hundreds of thousands of people who’ve had this done, there’s a 99.9% success rate. That’s still a lot of failures, . . . but this doctor’s been doing this for over 10 years, and claims to have never had any bad outcomes not related to non-post-op mischief. Clearly, very careful procedures and experience by the operator play a big role.

    I can say my LASIK outcome, 3 weeks post-op, has far exceeded expectations.

    In all these cases – there’s human involvement at one stage or another. Sound engineering practices and procedures, in combination with operating procedures, by experienced and professional operators, are what keeps us safe in the middle of that bell-curve. The edge cases are a matter of having staff who understand risk-assessment during the development of the system; who can accurately assess the risk of theses events, and the cost of mitigating them.

    In software engineering; particularly in voting machines, it’s not clear if this amount of rigor is applied. And in the case where an integrator (like Diebold) is buying and integrating proprietary components from vendors – there are layers upon layers of obscurity in the process. On the other hand – the purchasing decision for voting machines is being made by elected officials; perhaps ethically conflicted ones – who have no background in software engineering. A recipe for, something not-good..

  6. Tobias D. Robison says:

    Especially when lives are at stake, the programmer’s first concern should be to set boundary checks so that the machine being controlled will do nothing dangerous. I got my first experience in using a machine with no boundary checks at all in 1970. It was a wide-bed plotter that drew pictures on a paper roll 30″ wide. In my very first program, I mistakenly programmed the stylus to move 1/4″ down and draw a 6″ line. It drew a 6″ line, 1/4″ deep in the metal bed of the plotter, which had to be replaced at a cost of $1,000. That’s when I realized that there was no software driver for this plotter to protect me from disastrous mistakes!
    – tobias d. robison

  7. “How about putting barcodes on patient armbands that are read by the radiation machine? ‘Oops, you’re patient #103 and this radiation plan is loaded for patent #319.'” – but someone will enter the wrong data elsewhere. We could just have a big, flashing, neon sign saying Stomach Cancer Program and let the patient (who is more than a piece of meat under industrial processing) raise objections if they know they have something different.

    There are techniques that insure mathematical provability of code, but they are tedious and thus expensive. (I don’t have my references – I think greenleaf integrity is an example, and there is a methodology or two which constrain things).

    There is no excuse for not having interlocks, but they should be in hardware – providing an excess dose of radiation should require having to do something beyond hitting a key (It doesn’t apply to aircraft, but one has time to hit a discrete override switch to break a safety).

  8. Lina Inverse says:

    I recently reviewed the Therac-25 Charlie-Foxtrot, and one things that came out was how simply not using the machines wasn’t a perfect solution. You bviously you ran a risk of killing a patient, but if they didn’t get their radiation thearpy there was a good chance they’d die from their cancer not getting eradicated.

    Still, every time I read an account of a medical technican bulling ahead when the software is crashing and burning I cringe; the fact that they don’t find this all that unusual suggests bad things about the quality of the spectrum of software they use every day (granted, the Therac-25 was a long time ago when our expectations of system quality were low).

    Why not put an X-Ray dosimeter on the patient, wired to sound an alarm?”: One issue here is that dangerous or fatal doses can be delivered before someoen could respond to the audible alarm; also, the patient will tend to be in a chamber separated from the operator to limit the latter’s cumulative workplace exposure.