Technology > Systems Area Projects > Model and Data Uncertainty in Sensor Networks
Many activities in CENS boil down to construction and testing of models of physical phenomena, with the goal of minimizing the resources required for obtaining reliable answers to scientific questions. From this simple statement flow many practical and theoretical difficulties. The question addressed in this project is how can we be certain that either the data or the model are sufficiently reliable, or put another way, what resources and validation activities are required to enable the data or model to be trustworthy? Two situations are illustrated in Figure 1. In the data uncertainty problem illustrated in Figure 1a) nodes observe some source and report results to a fusion center. They also may communicate with each other in order to learn correlations in the observations, that may be used either for source coding or checking the reliability of other nodes. In general, a redundant deployment (beyond the minimum for Nyquist rate sampling) is required in order to do this checking. A mobile audit node, which is assumed to be reliable, may periodically make a circuit of a region to provide further observations and possibly calibrate the nodes. Among the problems to be investigated in this new project are the relations among measurement correlations, deployment redundancy, audit frequency and trajectory, and the reliability of the fused decisions.

Figure 1: Illustrations of data and model uncertainty problems
Figure 1b) illustrates one example of the model uncertainty problem, in which the question is the minimum number of sensors that must observe a field to determine that it was generated by one source (or more). Thereafter, a far smaller number of nodes must be active to monitor the discrete source than would be required to reproduce a field, if the scientific objective is to track the sources. Here as well mobile nodes might be involved in auditing performance. More generally the issue is the degree of certainty in a model constructed from some set of measurements. The joint data collection and model construction problem that merges both of these problem types is the grand end objective.
Neither of the problems described in the overview can be solved in their most general form. Our approach has been to begin with simple mathematical models (of sources, sensors and the environment) and interactions involving a small number of nodes in order for the problems to at least lead to tractable optimizations, and additionally to suggest heuristics that may apply to larger collections of nodes or more difficult models.
The data uncertainty problem has been the subject of considerable research already within CENS, from the perspectives of calibration, adaptive sampling, and reputation systems. This research is distinguished from these in being concerned explicitly with fundamental limits on the deployment redundancy and also on the information flows required to achieve a given level of fidelity (and also in using information theoretic measures to define that fidelity). The model uncertainty problem is also related to the multi-scale sampling problems in that it is concerned with the information required to model at particular levels of abstraction. It is also clearly related to adaptive sampling problems since it is envisioned that resources will be incrementally deployed, testing at each stage whether there is sufficient information to progress to a different model level. Models may of course be constructed only in particular contexts, and thus while the goal is the production of procedures that can apply to a variety of situations, this work will more realistically result in a tool box of algorithms that deal with a limited number of practical problems.
In this first year, we have made considerable progress in framing the problem of uncertainty in sensor networks and how to achieve trust in new models and experimental procedures. One begins with some reference model or experiment, which is trusted for some limited set of circumstances, or too costly to apply to a larger situation. In a Bayesian context, this is the prior information, and all progress in practice requires that there be such information or it is impossible to have any confidence in the eventual results. For example, one may have an expensive instrument which makes highly reliable local measurements. When combined with a trusted model regarding the smoothness of the physical phenomenon, one can then make a relatively small number of measurements to achieve reconstruction with high fidelity. On the other hand, with little confidence in the model, a larger number of measurements are required to validate it. In another example, one may be attempting to use a large number of unreliable sensors to characterize some phenomenon. If the phenomenon is well-modeled, and the sensors have independent errors, then consensus algorithms can determine which sensors are particularly unreliable. However, absent such conditions, then again trusted instruments are required to at least audit the sensors. Notice that in any case we must have trust in either a model or an experimental procedure, and further that the level of trust must be higher as greater fidelity is demanded. This also not incidentally vastly increases the effort required in validating the extension of the model or experimental method, highly motivating re-use of trusted components.
Another realization in the past year has been that there is also uncertainty in the goals of a model or experiment, in that success usually implies new questions will be asked (scientists just never seem to be long satisfied with having some particular question answered)! Thus what is to be optimized is performance over some set of related experiments, raising issues of how the model or experimental procedure is to be validated for the new questions posed, and how robust the machinery is to changed conditions. This suggests another reason for the popularity of modular approaches and lends further impetus to the study of multi-scale methods with their basic flavor of building up a toolbox of techniques. A goal of the research is to study the performance difference and level of validation effort for one-level systems as opposed to those constructed in a modular fashion.
As we proceed with this research, more sophisticated scenarios will be considered with the goal of producing a toolbox of techniques that have been validated using first simulated and then real data. The result will be a hierarchical system, with context-dependent algorithms at the bottom and context-recognition algorithms at the top. In some sense, the model-uncertainty algorithms will recognize context, while the data integrity algorithms will act to configure the systems within such contexts. Experiments and simulations will grow in sophistication with the development of the algorithmic toolbox.
PI: Greg Pottie
Participants: Nabil Hajuchehade, graduate student and Kevin Ni, graduate student