"Critical embedded systems" defined

As you may have come to realize, computers are used in many things never thought possible a few short years ago. These computers are “embedded” into devices, giving them intelligence that improves functionality. These intelligent devices can do more, and can do it faster than ever before. But not all embedded computers are created equal. Some have more demands placed on them than others. Many of these systems must be “able” in many dimensions: dependable, supportable, configurable, reliable, serviceable … and these systems must operate flawlessly to protect life, property, equipment, and the environment.

The term “embedded computer” is very broad without a universally accepted definition, leaving it unclear what is implied. In 2005, VITA set out to define a special-case term that matched the description of the largest share of the applications where VITA technology was deployed. The research started by understanding the definition of life- or safety-critical systems. From there, the term “critical embedded systems” was chosen, but the challenge arose in how to define this term clearly, as pertaining to what was being described.

After several weeks of discussions, “critical embedded systems” was defined as: life-critical or safety-critical systems where failure or malfunction might result in:

Serious injury to people, or
Loss or severe damage to equipment, or
Environmental harm

The definitions for “life-critical” or “safety-critical” were followed, but with the exception that the definitions be restricted to high performance, distributed computing systems that:

Manage high-bandwidth I/O
Involve real-time processing
Are environmentally constrained to Size, Weight, and Power (SWaP)

Ray Alderman, Executive Director of VITA, described several requirements that a critical embedded system must meet to fit within this description. These are systems that must survive in harsh environments: severe shock and vibration, extreme temperatures from low to high, and contamination from dust, dirt, oil, salt spray, corrosive gases, and many other contaminants.

The boards and boxes within critical embedded systems must be designed for long life-cycle applications; these are systems that are often not replaced or updated for many years. They do not become obsolete every few months like personal computers and consumer goods. Revision management is extremely critical to a product life cycle that cannot deviate from one production lot to the next. Therefore, they cannot use components or software that are constantly changing every few months. According to Alderman, if they do, they cannot possibly be called “critical embedded systems.”

Critical embedded systems are being designed with reliability as a primary design requirement. They reach the desired levels of reliability and Mean Time Between Failures through redundancy rather than hot swapping or live insertion of blades. Risks are usually managed with the methods and tools of safety engineering practices. A life-critical system is designed to lose less than one life per billion hours of operation. Typical design methods include probabilistic risk assessment, combining failure modes and effects analysis with fault tree analysis.

Critical embedded systems are highly deterministic, hard real-time in their responses to events. When an event is detected, a predictable response must be applied.

Cost, while important, is not the top priority. Trade-offs are frequently made in favor of supporting the critical aspects of the application, but at the same time, designers have to be conscious of system costs in order to maintain a workable balance. Commercial Off-the-Shelf (COTS) solutions are readily available, helping to keep costs under control; however, these COTS products are not shipped in high volumes nor do these use the same components used in personal computers. For these reasons, the unit costs are higher than consumer-grade products.

Critical applications keep us moving

There are many applications outside of military markets that need critical embedded systems. The intelligent highway, rail transportation, industrial controls, medical, scientific, space exploration, aviation, and many other applications meet the description of critical embedded systems.

Trains continue to be a major form of freight and passenger transportation. Nations around the world have automated train control (ATO) and railway operation systems with components on trains and wayside. Positive train control systems use technology that is capable of preventing train-to-train collisions. Rail safety acts have mandated the widespread installation of these systems. Rail traffic management systems enhance interoperability and signaling throughout train control and command systems. High-speed rail systems make it even more critical to improve the computing capabilities used in these
systems. [Source: MEN Mikro Elektronik, Embedded Tech Trends 2012]

The Large Hadron Collider (LHC) at CERN, used by physicists to study the smallest known particles, depends heavily on critical embedded systems for the control of the physics experiments conducted at the laboratory. Controlling the beams requires the performance and reliability found in a critical embedded system. [Source: MEN Mikro Elektronik, Embedded Tech Trends 2012]

Synthetic Aperture Radar (SAR) systems are used for environmental monitoring, Earth-resource mapping, and military systems that require broad-area imaging at high resolutions. Many times the imagery must be acquired in inclement weather or during night as well as day. SAR systems provide information that is critical to the safe success of many military missions. [Source: Pentek, Inc., Embedded Tech Trends 2012]

Software plays a major role in critical embedded systems

The best, most reliable hardware in the world is only as strong as the software that runs on the platform. While hardware failures are relatively easy to spot, software failures are not. Robert Dewar, President of AdaCore, suggests that the easy way to spot a software failure in a news story is to look for the term “glitch” in the report. Investigators will often state that a glitch was reported to have been the problem that led to a catastrophic failure.

Requests for safety-critical computer systems are increasing not only in air and ground transportation, but also in nuclear physics and critical industrial environments.

Despite using the best-designed hardware, how do you remove the glitches? Dewar suggests that a three-step process can help tremendously.

Step 1: Write a set of rigorous high-level requirements that can be understood thoroughly.

Step 2: Derive detailed requirements that can lead to well written code.

Step 3: Use the detailed requirements to generate tests that check out the code to the requirements.

He also adds that the single biggest change that could be made is to eliminate the tolerance for bad software. All too often, we accept the occasional “glitch” as something that we could not have done anything about, and so we tend to downplay bad code.

Making systems reliable is a consensus position. Both the hardware and the software teams need to be in alignment with total system design. They must work together from the requirements phase to development and on to test as a team that has reliability as a number one priority. Robert adds, “To the definition of ‘critical embedded system,’ I would also add ‘are designed through teamwork to be highly reliable.’”

Initiatives drive improvements

Several initiatives exist to promote safety-critical design. Much of this work benefits critical embedded systems. The Open-DO initiative is but one example of such an initiative led by the software community. Open-DO (as in “Open” and “DO-178C”) is an open source initiative that aims to create a cooperative and open framework for the development of certifiable software (www.open-do.org).

The Reliability Community is a collaborative effort by VITA members to develop a series of standards and guidelines to establish reliability practices for the critical embedded systems industry. The community is comprised of representatives from electronics suppliers, system integrators, and the Department of Defense (DoD). These members have developed community of practice documents that define electronics failure rate prediction methodologies and standards (see http://opsy.st/vitareliability).

Critical embedded systems: The cornerstone of the future

Future computing applications will become even more dependent on critical embedded systems as they continue to reach further into autonomous systems; these systems represent the next great step in the fusion of machines, computing, sensing, and software to create intelligent systems capable of interacting with the complexities of the real world. These systems must be “able,” often needing to be critical embedded systems.