The world of high-performance computing is particularly challenging, as new flavors of sophisticated sensors, complex cameras, and even changing political or social environments drive demand for the latest and greatest “run-faster, jump-higher” processing wares. Advanced computations, which include Software-Defined Radio, cryptography, and other types of arithmetic-intensive algorithms, are valued in R&D endeavors. And great progress has been made by using supercomputers, computer grids, clusters, clouds, gangs, and other forms of networked or otherwise connected compute nodes for such work. This is all well and good, until one wants to use this technology in real-world applications, particularly in military deployed environments such as unmanned vehicles or soldier-worn computers, where power consumption requirements – and certainly the weather – come into play.
Consider, for example, the real-world environment of a helicopter brownout, where billows of dust or sand caused by the helicopter rotors during takeoff and landing obstruct visual cues necessary for operational safely. Real-time synthetic vision systems deployed on the aircraft might assist pilots in these situations, but these systems must be designed to operate within the power budgets for the aircraft. It would be terrific if deployed high-performance computing requirements were simply solved with a one-size-fits-all system that was intrinsically power-efficient for any application and deployable in any environment.
This miracle of a system does not exist, and so the challenges remain: What kind of processing can best empower a military embedded system to provide supercomputing processing requirements that change nearly in real-time over product life cycles that could span decades? While none of them alone serves as the best remedy, modern Graphics Processing Units (GPUs) are providing a viable remedy when combined with other computing “rivals” – the General Purpose Processor (GPP), DSP, and FPGA. Variables of power consumption, environment, computing performance, and life cycle are examined for these four processing technologies. Additionally, an example of a GPU-based system helps exemplify this principle.
Traditional GPPs
GPPs, also known as “traditional CPUs,” feature ever-improving performance, well-understood, and mature software development tools – and are available in many form factors. The downside of using these GPPs in embedded high-performance applications might be limited product lifespan brought about by end-of-life commercial components or changes in platform support, along with latency issues that are always a concern with real-time applications (particularly vehicle systems or soldier-worn equipment). Meanwhile, environmental concerns can result in thermal issues and reduced quality of service in cold temperatures, and power consumption draws can run high.
High-throughput DSPs
One alternative to the GPP is the classic DSP, typically offering lower-power, low-latency components with high-throughput potential: These are all good things, excepting the pain that comes with learning the development tools and the slower relative processing performance, both of which mean that DSPs are not always real-world practical for military applications. Networks of DSPs are a tried-and-true solution for parallel processing requirements but magnify the shortcomings of the technology in general. Long development cycles abound, and deployment and maintenance can be difficult, sometimes limiting the usefulness of the technology. Furthermore, DSPs, like other processors, are becoming more power-hungry as their performance increases, meaning that heat dissipation and power usage must be addressed.
Reconfigurable FPGAs
Another alternative to GPPs is the FPGA, which has found a niche in the high-performance computing world, often utilized as a coprocessing device to massage data. Their inherent parallel architecture and performance – as related to processing power, latency, and throughput – are well-suited to many types of mission-critical signal processing applications. The field-programmable aspect of the FPGA’s processing unit is also highly beneficial, as updates can be implemented in near real time.
Although excessive power consumption can be an issue with FPGAs in embedded applications, power usage can usually be managed according to a given technical requirement. However, FPGAs are not always available in extended-temperature or rugged packaging, limiting their use in systems designed for harsh environments. FPGAs can also have longer product life cycles; if a particular device is end-of-life, functional deployed application firmware can usually be employed on a newer part with little additional effort.
Flexible GPUs
In contrast, GPUs are traditionally tasked with compute-intensive, floating-point graphics functions such as 3D rendering and texture mapping. However, some modern GPUs are structured much like parallel-architecture supercomputers and are being used for numerical, signal processing, physics, general scientific, or even statistical applications – all of which might be viable applications on the battlefield.
Programming tools developed for this purpose, essentially extensions of the ubiquitous high-level C as well as C++ (and recently Fortran programming languages), leverage GPU parallel compute engines to solve complex computational problems. These computations include largely parallelizable problems, which can be solved in significantly shorter timeframes by the GPU – in some cases 100x faster – than by a traditional CPU. This computing paradigm is called General Purpose computing on Graphics Processing Units or GPGPU.
Figure 1 depicts a traditional CPU versus two generations of NVIDIA GPUs, measured in iterations per second. Test 1 and Test 2 are two algorithms from a well-known benchmark suite. They both benefit from porting to the GPGPU, but by differing degrees. Comparison of the two results demonstrates that porting both algorithms to GPGPU benefits both algorithms. One can also conclude that the improvement in performance for well-selected algorithms warrants the effort of porting by the degree of improvement over a pure CPU implementation. A perfect CPU multicore port would multiply the leftmost column by a small integer number relating to the number of CPU cores available, whereas the GPGPU results are several orders of magnitude better, and seem to be increasing per generation of GPU at a greater rate than seems plausible for CPUs.
Additionally, GPUs are available in extended temperature and rugged packages, making them suitable for deployment on airborne or other environmentally challenging platforms. The projected GPU lifespan can be limited, but with careful material planning, this can be managed. As with GPPs, care must also be used with power management and heat dissipation, particularly with small form factor systems.
Enter a flexible GPU system architecture
So back to our original question: What kind of “miracle” processing system can best provide supercomputing processing requirements that change nearly in real time over product life cycles that could span decades – while taking power consumption and environmental concerns into consideration?
The answer: A system of GPUs combined with traditional CPUs, DSPs, or FPGAs. Such a “miracle” system would allow a developer’s specific signal processing or other algorithms to deploy effectively. These GPU-based systems can be designed to execute many operations (OPS, FLOPs, Teraflops) of usable processing. Moreover, the architecture could comprise a highly customized suite of 6U VPX boards, consisting of one or more compute nodes, an input/output board, an InfiniBand switch, and a management node, all housed in a conduction-cooled chassis and supported by a Linux-based operating system. Estimated system-level performance could then be as high as 1.55 GF/W at theoretical peak/Thermal Design Point (TDP). Fully populated, a GPU-based high-performance computing system could provide a theoretical 1.94 TF of floating-point computation in a physical package of less than 2 cubic feet, which would elicit a power draw of only slightly more than 1 kilowatt.
Such a GPU-based high-performance computing system could additionally include a 6U VPX carrier card, rendering it a configurable SBC with graphics capability. These flexible SBCs could then be installed in an air-cooled chassis, with forced-air cooling, or could be used in a rugged conduction-cooled chassis suitable for harsh environments such as for airborne or naval deployment.
As technology advances, the COTS modules can be upgraded, extending the system’s life cycle. Because the deployed form factor does not need to change upon upgrade, enabling deployed systems to be quickly upgraded – or even downsized – to fit power budgets or other environmental constraints is easier. An example of a GPU-based high-performance system suitable for military deployment is Quantum3D’s Katana system, based on the Tanto compute node.
Flexibility is good
Leveraging GPUs – combined with various compute options including GPPs, DSPs, and/or FPGAs – on a rugged and flexible hardware platform is a step in the right direction to creating a “miracle” supercomputing platform for modern military systems. The ability to upgrade boards or modules in a rugged chassis increases the length of the product duty cycle and allows adaptation to environmental factors. As processing capabilities improve and technologies such as GPGPU progress, the ability to design in an already-qualified VPX chassis, such as Quantum3D’s Tanto VPX system (Figure 2), for example, is an added benefit. CS
Quantum3D, Inc. 408-600-2595 www.quantum3d.com