Suppose you are part of a privately held startup that has just purchased an existing satellite-based telecommunications network and need to update the proprietary RTOS running on the satellites. As the only provider of voice telephony and data communications for many parts of the globe to customers involved in mission-critical endeavors, a failed satellite could leave a customer stranded in a remote location with no means of calling for help. The ramifications of such a failure become clearer when considering the network’s customers: the U.S. DoD, aircraft, ocean-going vessels, and companies using the system to track and manage remote assets.
As a startup, the only resources you have are those acquired as part of the satellite network. In this case, the constellation consists of 66 satellites in low Earth orbit, several additional “hot spare” satellites in slightly different orbits, the team of engineers who originally built the software for the satellites, and three engineering test stands that are more or less identical to the satellites when launched. Just to make things interesting, you don’t have the development hardware used when the satellites were built. Additionally, during the years the satellites have been in orbit, various hardware failures have occurred in the electronics of a number of the satellites. The result: No two satellites’ failure sets are identical.
All of this can be summarized with the question: “How do you develop software for a complex heterogeneous system to which you don’t have physical access?” This is the exact situation that the newly formed Iridium Satellite LLC faced in 2000. Clearly, the traditional solution of building more than 70 hardware mockups was outside of the budget. A virtual platform provided the best remedy, addressing the issues of both software development for legacy systems and space environment challenges.
Software development for legacy systems
One of the major life-cycle challenges of any computer-based system is keeping the software updated, and the Iridium satellite system was no exception. Whether the reason for an update is simply to fix bugs, add new functionality, or deal with a degraded state of the hardware platform, the challenge remains the same. In all cases, new code must be written, compiled, debugged, tested, and in some cases certified.
In the resource-constrained environment of a satellite, a machine-independent virtualization layer such as Java is simply out of the question, which means that the code is likely to be heavily dependent on the specific hardware in use. While many terrestrial embedded project engineers have chosen to use a common operating system such as Linux, which allows PC-based development and test, this violates the “test as you fly, fly as you test” motto held dear by many flight systems developers. Thus, a development system that is very similar to the flight hardware is necessary as part of the software development process.
In the case of a budget-constrained satellite network where the original development hardware is no longer available, the problems are magnified. While the satellites’ processor, a Motorola PowerPC 603e, and most of the components such as the system controller, memory controller, DRAMs, and bus controllers are still being manufactured, many are only used in legacy embedded designs, so development/evaluation systems are no longer common. Specialized components such as the L-band radios and the hardware to allow software to control which radio is connected to which antenna are even harder to find.
The satellites’ basic digital hardware is COTS-based and relatively similar to first-generation Apple Power Macintosh. The traditional approach of buying a recent Macintosh and adding boards, however, has spurred a spirited debate as to whether it is “close enough” for development and testing of flight-certified software. But the question is becoming increasingly moot as Apple has switched from Motorola/Freescale PowerPC processors to x86 processors in new machines anyway. It is unclear if Apple’s acquisition of P.A. Semi can be seen as a signal that they intend to switch back to PowerPC, but even if that is the case, P.A. Semi never built a 603e.
With all these factors in mind, the Virtutech Simics virtual platform was chosen as the development system for the satellites, and a virtualized software development methodology was put in place to maximize the benefits of the platform. The virtual platform works by modeling the exact current state of the components of the target system, instead of being composed of COTS – enabling the replication to avoid the whole “close enough” question. By building a virtual platform model that included the full satellite and not just an instruction set simulator or other subset, the full software stack could be developed and tested, ensuring that engineers can “fly what they test.” Table 1 summarizes the features of Virtual Platform vs. COTS vs. test stand based solutions.
Satellite environment challenges
In addition to the challenges of software development for legacy embedded systems, harsh space environments provided additional challenges for the satellites. One issue was Single Event Upsets (SEUs) which – when combined with the satellites’ limited shielding – produced an environment where electronic components occasionally suffered permanent failures. (At present, a number of the satellites have suffered at least one permanent failure such as the loss of a radio, antenna direction controller, or DRAM bank.) Because the target system comprised a constellation of satellites rather than a single satellite, the challenge was multiplied because no two satellites were likely to have exactly the same random failure set.
During the system’s original development, a single system image could be built and loaded onto all satellites. At the time of our case study, there was only a limited subset of hardware known to be failure free on all satellites and still powerful enough to be considered a complete system. Thus, the constellation could no longer be considered a single target replicated many times, but rather a collection of similar targets, each of which needed a custom software load to account for its individual failure mode.
A second challenge of operating in the space environment was one of physical access. Unlike a terrestrial environment where it might be possible to simply reload the system if a new OS image doesn’t work, at 485 miles removed, engineers couldn’t simply hit the reset button. This sort of problem is traditionally solved by uploading a new OS image without deleting the old and setting a watchdog timer to auto reboot using the old image if the new image doesn’t finish booting and clear the timer. In this case, several of the satellites’ failure sets included a degradation of storage such that there was no longer room for two complete images. And, if a satellite failed to reboot, engineers had no way to move it out of its orbit to bring in a spare. This would result in a dead spot in the global network, leading to a critical communications failure for a customer with no backup.
Iridium Satellite LLC, which is a smaller company employing fewer engineers than the original producer of the constellation, faced these concerns by again implementing a virtual platform to model the current state. This was particularly relevant because some of the target systems in question had suffered permanent hardware failures, and few COTS parts support selective functionality degradation. Additionally, costs were dramatically lowered as a single virtual platform could be configured to match any of the satellites, while a different physical platform would have needed to be maintained for each.
Since the virtual platform included the ability to replicate the exact failure set of each target, extensive testing could be conducted to ensure that the reboot would always work. As an example, consider the situation where upon power-up, the software probes the system to determine the current state. Did every developer of every system write the code in such a way that the system can keep booting if the subsystem failed? Or did they decide that this system was sufficiently important that any failure would be critical anyway and thus not implement a workaround? Traditionally this is guarded against with a watchdog timer triggering automatic failover to a known-good software configuration; however, some of the satellites no longer had enough storage for that known-good configuration. With the virtual platform, all possible subsystems and combinations could be made to “fail,” thus allowing for complete coverage testing.
Virtualization enables global communication
For Iridium Satellite LLC, the answer to legacy software development from a location with no access to the target hardware was to implement a virtualized software development process combined with a virtual platform. This combination has provided a dedicated team of engineers the tools they need to maintain and improve an aging existing constellation of satellites with higher productivity and lower costs than they achieved during initial development.
Ross Dickson is principal technology specialist at Virtutech. He has worked with numerous customers to implement virtualized software development platforms and introduce their use across the enterprise. Before joining Virtutech, Ross investigated multiprocessor memory system design at the University of Wisconsin-Madison.
Virtutech
408-392-9150