Top 10 Common Practices that Lead to Software Failures

#3 Requirements testing is necessary but insufficient

Since the 1980’s there has been an “either-or” mentality to software testing. Either you test the requirements or you test the design/code. It is a common myth that testing the requirements is all that is needed.  The fact is that if requirements based testing were sufficient there would be no failed projects and no world events due to software failures. Clearly this popular testing approach is not working. The facts[1] show that organizations that develop reliable software on time do both requirements and design/code testing.  This is why.

Testing only requirements may- at best -cover 40% of the code. Requirements based testing won’t cover:

·       Endurance or peak loading (Caused the Iowa Democratic Primary Caucus and SCUD missile attack failures)
·       Timing (Caused the Therac 25, 2003 Northeast blackout failures)
·      Data definition (Caused the Ariane 5 and F22 International dateline)
·       State transitions (Multiple events due to dead states, prohibited state transitions, etc.)
·      Logic (Caused the AT&T Mid Atlantic outage in 1991)
·      Fault injection (Incorrect fault handling caused the Apollo 11 lunar landing, Quantas flight 72, Solar Heliospheric Observatory spacecraft,  NASA Spirit Rover failures)
·      Requirements that are missing crucially important details (Another cause of the F22 International dateline failure)


Why is this approach so popular? Answer:

·      Engineers have difficulty understanding “necessary but not sufficient”.
·      People who don’t understand software engineering started this myth in 1980s because they assumed that testing all requirements is equivalent to testing all code.
·      Software engineers hate to test design and code and hence do their best to propagate this myth.


So how does one change this popular but ineffective approach? Requirements management tools such as DOORS present obstacles for testing anything but requirements. Some alternatives include:

·      Pull more details into the software requirements specification.  The more detailed the SRS the more code coverage you get when testing.
·     Include pictures and tables as informative information (#4 on top ten list)
·     Develop testable requirements for what can go wrong (#8 on top ten list)
·     Include testing of design (#6 on top ten list)
·     Test the mission and not just one requirement at a time (#8 on the top ten list)

Perspective from an experienced engineering leader

There’s an old adage in software development that states “you can’t test in quality.”  Others state “software doesn’t break.”  Both of these miss the point as they relate to testing of code in development.  Highly Accelerated Life Testing (HALT) and Highly Accelerated Stress Screening (HASS) for electro-mechanical systems demonstrate that you can, and must, utilize testing which forces components to failure to identify and correct inherent design weaknesses and to objectively characterize the reliability of the system as a whole.  While it’s arguable that software does not “break” in the classic sense of a physical change which no longer conforms to original specifications, it does “fail” when it operates in a manner that was unintended with consequences which may range from undetectable by the user to catastrophic depending on the nature of the failure and the effect that failure has on application or system operation.  Just like the breakage of a mechanical component, software failures often require a repair function although it may be in the form of an application restart or system reboot as opposed to component reconditioning or replacement.  A comprehensive approach to testing the software is necessary for ensuring software quality and reliability and requirements-based testing alone is insufficient to meet these goals.

Requirements based testing is, by definition, only as good as the written requirements on which it is based.  Functions that may have been implemented but not explicitly defined in the requirement set may not be tested at all while those which are ambiguous or lack sufficient definition of edge cases may be only partially tested.  In a complex application or system, there is often a one-to-many or many-to-many relationship of requirements to test cases.  In these situations, completeness of the test cases is dependent on the level of understanding and degree of rigor of the engineers developing and executing the test cases.To ensure user satisfaction, it is important that a subset of the requirements reflect the user needs of the application or system from which lower-level requirements can then be derived.  The US Food and Drug Administration (FDA) refers to the demonstration of fulfillment of user needs as Validation with a demonstration of conformance to requirements defined as Verification.  The inclusion of actual or representative users during Validation testing is an important element in demonstrating that their needs are met with the application as developed.

The only way to fully understand the completeness of testing is the use of coverage tools that monitor the application in operation to determine which lines of code are being executed during the test cycle.  Even full structural coverage is not enough as the code may operate differently based on conditional flow and the variables which determine the flow. The Federal Aviation Administration (FAA) has adopted a standard for code coverage requirements based on the severity of consequence in the event of a software failure.  DO-178B/C defines five levels of risk for software components as shown in the table below.


A
s you can see from the table, in avionics software, requirements testing is sufficient only for Level D components which have at the most minor impact on safety in the convent of failure.  While I do not believe it is necessary to test to equivalence with Level A requirements for all software, I do believe that testing of most commercial applications falls far short of even Level C requirements.  If the code is not tested to the extent of its potential operating conditions, it cannot be a surprise when it behaves in an unintended fashion.

Test automation is generally required to effectively and efficiently reproduce the conditions which drive variations in conditional flow including error handling which may result in the unintended or undesired operating behavior.  It is important that automated testing recreate as much as possible the operational conditions which will be experienced by the application throughout its lifecycle. After completion of the unit test and by the development engineer along with integration test for components delivered from multiple developers, latent failures generally are not encountered the first time or two a function is called.  Hundreds, thousands, or millions of cycles may be required for the conditions to arise which results in the failure.  These conditions could be the result of common coding issues such as resource or memory leaks or insufficient storage capacity.  Manual testing cannot efficiently execute the number of cycles required for these issues to be exhibited nor is it conducive to reproducing the behavior when a failure is encountered.In summary, a comprehensive approach to testing which includes confirmation of user needs, objective measure of the code exercised and usage of automation to emulate lifecycle conditions in addition to demonstration of full requirements coverage is necessary to ensure the quality and reliability of the code being delivered.


Tom Neufelder, Retired Senior Vice President Philips Healthcare, Diagnostics Imaging
Next month we will discuss the fourth most ineffective software development practice – “Using words when pictures are better”.
[1]”The Cold Hard Truth About Reliable Software”, edition 6i, 2019, Ann Marie Neufelder

Since 1993, we’ve been bench-marking the reliability of software-intensive mission-critical  systems of more than 150 software projects spanning the defense, space, aerospace, energy, electronics, healthcare and other industries. Our benchmarking database has the actual success of the software project as successful, mediocre and distressed as per the objective criteria in this table.

We have identified 10 development practices that can be replaced with 10 practices that are proven effective at improving software reliability and on time delivery.

This is the third installment of a 10 part educational series and covers regular reviews with software engineers and their leads. We will also hear some key insights about each practice from a senior engineering leader in the medical device, defense and avionics industries with over 35 years of experience.  This month we will cover the practice of frequent reviews with software engineers.  See last month’s installment if you missed it.