Technology > Systems > Tools and Techniques for Reliable Sensor Network Software
Todd Millstein
Writing reliable software for sensor networks is particularly difficult because of the intrinsic nature of such software: distributed, timing-sensitive, event-driven, and closely coupled to hardware. In this project we are developing tools and techniques to analyze sensor network software in order to detect important classes of errors. Our goal is to eliminate these errors before they can manifest in a running system. Our technical approach, which is detailed in the next section, is one of staged checking: a sequence of checks that rely on increasing scope and knowledge are performed on the code at compile, load, and run times. The focus of our work thus far has been on checkers to ensure proper management of memory. The severe resource constraints and lack of memory protection on many sensor network platforms make memory errors all too common and costly. We are currently instantiating our checkers in the context of the SOS operating system for sensor nodes, and we plan to adapt the techniques we develop to the Tenet system in the future.
The goal of our approach is to detect errors as early as possible in the software development cycle. As such, staged checking is a natural fit, since it allows errors to be detected as soon as the relevant program information becomes available. First, many interesting checks can be performed statically, when each individual software module is compiled. For example, each SOS module must be checked to handle the INIT and FINAL messages, which are sent by the kernel in order to respectively initialize and destroy a module. Second, SOS supports dynamic loading of modules, and several system errors can be detected at module load/unload time. For example, it is possible for a module to be removed even though another module is still subscribing to a function published by that module. Finally, run-time checking can be used to trap errors right before they would actually occur. For example, run-time monitoring can detect if two modules ever attempt to simultaneously obtain exclusive access to some device, throwing an exception or invoking a user-defined error handler to recover at that point.
We are currently developing a staged checker called Lighthouse for ensuring proper memory management in SOS modules. Lighthouse consists of two stages. First, a static checker enforces a memory ownership protocol on each SOS module in isolation. The protocol ensures that each piece of dynamically allocated memory has a unique owning module, which has the sole right to manipulate the memory and the responsibility to eventually free the memory or transfer ownership to another module. To make static checking more precise, we allow programmers to specify a finite-state machine (FSM) for each SOS module, which specifies the expected ordering of the module’s events at run time. To validate these FSMs, a run-time stage of Lighthouse performs program monitoring to ensure that each module’s dynamic event trace conforms to its FSM. That second stage of Lighthouse is currently being implemented.
The static stage of Lighthouse is implemented in the CIL tool, which parses C code into a simple intermediate format upon which analyses are performed. Each SOS module can be analyzed in isolation, thereby providing early feedback for developers. The checker does not modify any code, so users can continue to compile their SOS modules with the C compiler of their choice. The checker is part of the standard SOS distribution and is also freely available at https://projects.nesl.ucla.edu/projects/lighthouse. We have evaluated the static checker on all historical versions of all user modules in the SOS CVS repository, as well as on the SOS kernel itself. The checker generated 25 warnings of which 8 were real errors when checking user modules and 35 warnings of which 2 were real errors when checking the kernel.
We are currently implementing Lighthouse’s run-time stage. We are developing a declarative language for specifying FSMs as well as a tool to instrument a program to monitor these FSMs dynamically. In future work, we plan to develop techniques for automatically inferring FSMs to reduce user burden, either through static analysis or through analysis of dynamic execution traces. We also plan to apply our staged approach to validate usage of resources other than dynamic memory, including stack space and energy consumption. Finally, we will adapt the approach taken in Lighthouse to ensure proper resource management in the context of the Tenet system.
Faculty:
Graduate Students: