Serial RapidIO addresses design challenges in critical embedded systems

VITA is a standards body that governs the specification of the VMEbus – a protocol that is prevalent in critical embedded systems. As performance requirements for these systems evolved, so did VITA’s specifications (see Figure 1, courtesy of VITA). At the foundation was the VME32 standard, which started out supporting a bandwidth of 40 MBps. Higher throughput requirements spawned the creation of the VME64 and 2eSST protocols, which are able to support up to 320 MBps. VITA now has VPX and VXS extensions that include high-speed serial interconnects such as Serial RapidIO. The key issues that designers are trying to address in critical embedded systems include performance, scalability, fault tolerance, peer-to-peer communications, and redundancy. Serial RapidIO is proving to be a viable solution in addressing these system issues.

Performance: Key in critical systems

Critical embedded systems are life or safety-critical systems with applications spanning the industrial, medical, and defense and aerospace markets. Recent advancements in each of these fields have resulted in an increase in performance requirements for the computing systems that serve these applications. For example, medical imaging is being used to diagnose more ailments and with more precision. The embedded computing system that comprises that particular medical imaging system must provide more processing power to nourish the imaging systems’ increased performance requirements. As a result, higher-core frequencies of processors and DSPs in a host-centric architecture are not enough to keep up with the demands of increased system performance.

Therefore, designers must resort to a more distributed processing architecture. Distributing compute-intensive tasks across several processing elements yields better performance by parallelizing multiple operations. However, the use of numerous processing components (or endpoints) on a board also means that there are more elements contending to access a central switch. Since the switch needs to manage data across these multiple endpoints, latency across a non-blocking switch becomes vital to maintaining system performance. The Serial RapidIO standard was conceived with that system issue in mind. The packet header was kept to a minimum for optimizing transmissions across the link. The Transport Layer of the protocol was designed to make it easy for a switch to route traffic effectively and with minimal latency. For example, a switch only has to look at the Destination ID (DestID) to route a packet. A simple lookup is necessary on an 8 or 16-bit field to direct a packet to its destination. The cut-through latency can be accomplished in approximately 100 ns.

Scalability with little effort

The ability to easily modify a design to adhere to changes in system requirements (such as increased performance) is of paramount importance. Serial RapidIO designs can scale with little effort depending on the application. Serial RapidIO register definitions, for example, do not change as link speeds increase or decrease.

In contrast, GbE has a completely different register set, depending on the operating speed. The register set for GbE is not compatible with the 10 GbE register set. By allowing reuse of the same software at various speed grades, Serial RapidIO protects any investment made in that software and thus reduces the complexity involved in scaling a design based on changing system performance requirements.

The vital art of fault tolerance

Fault tolerance is vital to critical embedded systems. Consequently, the Serial RapidIO packet format was created with fault tolerance in mind. Control symbols are used for link initialization and synchronization, packet delineation, error conditions, and reliable delivery. Since control symbols are such an important ingredient of the Serial RapidIO protocol, they are protected by a 5-bit Cyclical Redundancy Check (CRC) value. Bits that are not covered by CRC or parity are protected by other means. For example, the AckID field is protected by using rolling AckIDs. Packets are assigned a sequential AckID (in order) and acknowledged in that same order. AckIDs that are out of order (outside of the sequential rolling value) indicate that an error has occurred. Packets are also protected by an end-to-end cyclical redundancy check. When a CRC error is detected, the receiver sends a "packet not accepted" control symbol to the transmitter, causing it to resend the packet.

Efficient peer-to-peer communications

As mentioned earlier, increased performance requirements have spawned the need for more distributed processing elements within a system. Thus, by adopting a target ID-based routing system, Serial RapidIO has greatly simplified peer-to-peer communications (Figure 2). For example, in PCI-based systems, both sides of a bridge or switch need to have knowledge of the memory map. In contrast, each endpoint in a Serial RapidIO-based system has a separate memory map. This allows direct memory reads and writes without the need to go through address translation. Not only does this reduce overhead, it also reduces complexity by eliminating the need for software to set up a system to adhere to address translation.

Furthermore, messaging, or the ability of two endpoints to transfer data between each other without requiring knowledge of each other’s address space, comes into play. The Serial RapidIO protocol uses a "push" architecture (which means the receiver is responsible for storing the message in its memory system) to implement messaging. All verification of correct message data passing is done in hardware, without the need for software intervention. In contrast, PCI Express employs a more arduous, software-dependent process for messaging that not only adds to latency, but also increases system complexity. For example, in PCI Express based systems, designers need to request permission and specify the location of where to send a message. With Serial RapidIO, the message is simply sent to a mailbox and the receiver is responsible for storing it in the proper location.

Redundancy increases mission success

Another particularly important requirement for critical embedded designs is the ability to keep the system functioning even if one component within that system fails. Numerous topologies can be used to build redundancy in a system, including star and full mesh (Figure 3).

A star topology is more suited to larger and fault-tolerant systems and is easy to expand, making it more scalable. Since all data goes through a central point, a switch that is armed with strong error management, statistics gathering, and fault tolerance features can make it easy to troubleshoot and manage the system.

On the other hand, a full mesh topology connects all devices to each other for fault tolerance and redundancy. Connecting each of the nodes to every other node has its advantages. If one link fails, data can flow through any of the other links to reach its destination.

Serial RapidIO supports these topologies through its system discovery, initialization, and configuration options, as well as support for multiple hosts. This gives designers the freedom to tailor their architecture to the topology that yields the most benefits for their particular system. While other interconnect protocols such as PCI Express also support these topologies, they are more complex in terms of software and memory map management.

Serial RapidIO meets future distributed processing trends

The Serial RapidIO protocol was designed with distributed processing in mind. Serial RapidIO’s features maximize performance, scalability, fault tolerance, peer-to-peer communications, and redundancy, and allow designers to build powerful systems with ease. Future trends in aerospace and defense, industrial, and medical applications are moving toward ever higher performance and even higher reliability. Tundra’s industry-standard Serial RapidIO switches facilitate building reliable and scalable high-performance systems and can minimize time to market. CS

Kashif Hasni is a product manager at Tundra Semiconductor Corporation. Kashif holds a B.Eng. in Systems and Computer Engineering. He has nine years of experience in ASIC design and verification and more than three years of experience in product management. He can be reached at [email protected].

Tundra Semiconductor Corporation
613-592-0859
www.tundra.com