General-purpose GPUs breathe new life into high-performance embedded computing

Many advanced applications for high-performance embedded computing demand excessive amounts of computing power. Real-time imaging systems in applications such as persistent surveillance and electronic warfare applications, among others, require the highest possible GFLOPS/Watt to meet performance requirements without exceeding the power budget. Traditional CPU-based boards simply don’t meet these power budget constraints.

General-Purpose GPUs (GPGPUs) are currently being used in high-performance embedded computing applications where GFLOPS/Watt metrics are paramount. Before deciding whether to embark on the CPU or GPGPU route, however, it’s important to explore the differences between GPUs and GPGPUs – and to understand how these GPGPUs (as opposed to CPUs) are a natural fit for high-performance embedded computing applications.

The rise of the GPU versus CPU equation

Government programs are putting the squeeze on prime contractors to develop more warfighting capability faster. At the same time, the needs of embedded defense computing platforms are accelerating: to acquire more data and arrange and process it more quickly, with the goal of extracting actionable information immediately and making it available in real time to the warfighter. The need for creative and innovative solutions to the “actionable information” problem has never been stronger.

Government agency mandates and the requirement for actionable information aren’t the only pressures that affect prime contractors. Consider Size, Weight, and Power (SWaP) and historical constraints that greatly impact the adoption and performance of deployed platforms. Together, these issues force prime contractors to turn to innovative solutions in order to squeeze every ounce of performance out of their subsystems.

GFLOPS/Watt matters

Real-time imaging systems in deployed environments such as persistent surveillance, onboard exploitation, and electronic warfare applications, among others, require the highest possible GFLOPS/Watt to meet deployed performance requirements. Frequently, these subsystems are the last to be added onto the airframe and are subsequently allotted the platform power budget’s smallest portion.

Many CPU-based boards can’t keep up with the stringent GFLOPS/Watt requirements. For example, peak theoretical GFLOPS/Watt for the IBM Cell processor is 1 GFLOP/Watt, while the peak theoretical GFLOPS/Watt for AMD’s ATI RV770 GPGPU is 9.23 GFLOPS/Watt. Graphical Processing Units (GPUs), first introduced by NVIDIA in 1999, have always had very high GFLOPS/Watt metrics. Although the GFLOPS/Watt value increases with every new chip revision, GPUs traditionally performed best in the application for which they were designed – graphics processing in desktop systems.

Serial or parallel processing?

GPUs have been architected to maximize arithmetic and logic performance on one type of very self-similar data. CPUs, on the other hand, offer more control support and disposition of data – such as flow control and caching. In general, CPUs operate on data in a serial fashion; for example, even when a matrix operation is performed, it is necessary for the CPU to perform the overhead task of loading each input element sequentially. CPUs were architected to support flow control for comparisons and decision making, in addition to calculations. Conversely, GPUs process data in a parallel arrangement because they contain a matrix of multiple cores of simple Arithmetic-Logic Units (ALUs) to rapidly perform simple calculations in parallel. This high degree of parallelism is what makes GPUs efficient and fast image processing engines for high-performance military applications.

Programmability versus upgradeability

So, given their parallel performance potential and low power consumption, why haven’t GPUs been utilized much for high-performance embedded computing? Programmability is one reason, and upgradeability is another.

The software environment for GPUs is notoriously non-intuitive even to proficient embedded programmers. The environment is based on graphics primitives – not high-level language constructs or even CPU assembly variants. And the basic structure of programming tools for GPUs does not offer the optimizations that programming languages for CPUs do. GPGPUs are a relatively recent concept of GPU computing, offering a developer-friendly software environment. Software developers can now program GPGPUs with familiar constructs such as well-defined APIs and indexed matrix operations.

Historically, GPUs have not been easily upgradeable; they have been discrete components soldered directly onto the printed circuit boards. Upgrading the chip as new versions become available would require a complete board respin. Many of today’s GPGPUs (from ATI, NVIDIA, and others), however, are available in a mobile PCI Express module (MXM): an easy-to-insert format that facilitates upgrades when new, faster GPGPUs are available. Adherence to the MXM specification, developed by NVIDIA and now a stand-alone specification, ensures easy upgradeability for technology updates.

GPGPUs: A natural fit for high-performance embedded applications

A high GFLOPS/Watt ratio, parallel processing capabilities, and a programmable software environment and upgradeability are all now available with today’s GPGPUs. The application space in high-performance embedded computing for defense is clearly defined.

As mentioned, several applications in the high-performance embedded computing space could benefit from the use of GPGPUs. Persistent surveillance – an unmanned aerial vehicle application characterized by long mission duration and onboard sensor data exploitation – is a particularly good example. The long mission duration aspect of persistent surveillance demands minimal power consumption. Meanwhile, the intense computational aspect of onboard exploitation, including image stabilization and geo-registration, requires parallel processing – such as that provided by GPGPUs – to provide real-time, actionable information to the warfighter.

The missing link is a platform or environment that can support experimentation and algorithm tradeoffs. One such link is Mercury’s Sensor Stream Computing Platform (SSCP, see Figure 1), a 6U VXS-development chassis that is the size of a piece of carry-on luggage, weighs 32 pounds, draws less than 600 W from a standard wall outlet, and achieves 3.84 TFLOPS (see Figure 2). The SSCP tunable power/performance operation allows the user dial-down GPU clock speed to minimize power consumption during periods of inactivity, as is required for persistent surveillance and similar applications.

**Figure 1:** The Sensor Stream Computing Platform

**Figure 2:** Sensor Stream Computing Platform: Peak FLOPS versus GPU clock rate

Anne Mascarin is a Product Marketing Manager at Mercury Computer Systems, where she has been employed for five years. Previously, she worked at The MathWorks and Analog Devices, Inc. Anne holds a Master of Science in Electrical Engineering from Northeastern University and a Bachelor of Arts in Economics from Boston University. She can be contacted [email protected].

Scott Thieret is the Technical Director for GPU Computing at Mercury Computer Systems, where he has been employed for 10 years in various positions dedicated to GPU development. Prior to Mercury, he worked at Avid, MITRE, and IBM. Scott holds a Bachelor of Science in Computer Engineering from the University of Vermont. He can be contacted at [email protected].

Mercury Computer Systems 866-627-6951 www.mc.com