Skip Header NavigationIntranet 
CENTER FOR EMBEDDED NETWORKED SENSINGContactDirectionsEmploymentEventsNews
HomeAbout UsResearchEducationResourcesPeople

Research Area


Statistics and Data Practice

Technology > Statistics and Data Practice

OverviewResearch ProjectsPeopleResources

Overview

This year, CENS introduces a new research area in the annual report; an area that specifically addresses issues related to data practices in sensing. Its appearance is the product of several independent events within the Center, each acting to foreground the importance of data practices in the design, deployment and maintenance of observing systems.

First, publication within CENS is shifting from purely network-related venues to application area journals, and, as a result, more emphasis is being placed on usable data and corresponding models for observed environmental phenomena. Second, this reporting period has seen the largest number of CENS deployments to date and with this growth, a host of new research questions related to the operational characteristics of fielded systems, including experimental design and sampling. Next, two cross-CENS initiatives have embarked on systematic, data-centric studies of deployments, and are producing software platforms to broadly address the usability of CENS sensing capabilities: a) An NSF-funded reading group on data quality and integrity is studying how sensing systems fail, and, in turn, has developed human-in-the-loop strategies to identify and diagnose problems quickly; and b) as part of the core-funded “Data management” project, interviews were conducted with 22 CENS researchers to help us understand how data flow through the Center, and to make recommendations about data sharing practices that will broaden collaboration between different disciplines. Finally, the rise of our Urban Sensing projects has helped to underscore the importance of data for network design and functioning. Two recently funded NSF-FIND (Future Internet Network Designs) grants anticipate a broad adoption of sensing technologies and envision a kind of citizen-initiated sensing. One early finding is that to transition sensing from forests and eco-systems to the built environment, the underlying data take on much more personal importance and users’ privacy needs expose our data practices in the starkest terms.

Given these moves, it seems worthwhile to cull modeling and system efforts from across the Center and create a community of researchers and projects related to data practices. For this report, we have divided our work into three groups: Modeling; strategies and platforms for deployment planning and data collection; and methods and systems support for data sharing.

Modeling. Statistical models are being developed to support the operation of virtually every component of a sensing system. Far from purely automated, black box routines, modeling is also at the core of many of our human-in-the-loop systems, requiring a certain degree of transparency. In the projects documented here and in other sections of this report, the reader will find a variety of statistical models and estimation paradigms at work within CENS: from simple parametric (linear) models, to modern flexible non-parametric basis expansions; from time series to dynamic models; from ordinary Gaussian processes to sophisticated spatio-temporal processes based on mathematical models (ordinary or partial differential equations) describing the physics underlying environmental phenomena. We now focus on two specific classes of applications for statistical modeling.

Deployment design. This year, the project “Sensing under model uncertainty” turned to the problem of sensor placement, appealing to so-called “alphabet optimality” from experimental design. Classical optimality begins with a known parametric statistical model (say, a quadratic polynomial in spatial location) and seeks to take measurements (place sensor nodes) so as to achieve minimal estimation error (defined by the experimenter). In many sensing deployments, however, we do not know the precise structure of the model; and, even if known, there might, be many different models in play, one for each sensing modality. The “Sensing under model uncertainty” group have started to address these issues in the context of D-optimal designs (“D” for the determinant of the Information Matrix, one kind of error measure), considering both the primal and dual optimization problems. Their work has led to the consideration of a Bayesian analysis, and might begin to cross over into formal Bayesian design. Other work this year within CENS also leverages models for sensor placement: Various staged or incremental design methodologies rely on models and their predictions to assess the next best locations to take measurements. In the year ahead, we can anticipate transitioning this work to the field, as well as the creation of new designs for system robustness.

Fig 1- Senor Data Slog Share Use System operation and maintenance. In some sense, the genesis of a Statistics and Data Practices Area can be seen in the evolution of a successful CENS tool for fault diagnostics. Sympathy is a rule-based system for identifying network and application faults. With Sympathy as a starting point, we have introduced a new system called Confidence that looks more deeply at the behavior of deployed hardware and its interaction with environmental phenomena; and applies a dynamic statistical clustering technique to identify and diagnose anomalous sensors or measurements. In addition to Confidence, several projects were initiated this year that focus on outlier detection, fault detection, or broadly, the act of distinguishing normal from abnormal. The “Data Integrity” project, for example, introduced a cascade of models, from the observed phenomenon itself to sensor and network behaviors under normal and faulty conditions. Uncertainties in these models are incorporated in a Bayesian model for the complete system. Ultimately this work identifies faulty nodes by forming groups of sensors based on patterns in their measured data. The “Anomaly Detection” effort took a broader view, first dividing detection methods into groups: Some depend on pre-specified rules (as was the case for Sympathy), some involve correlation analysis (simple regression models), and others leverage statistical learning schemes (Hidden Markov Models). This work also makes use of a fault model allowing for noisy measurements, the so-called “stuck at” fault, and transient bursts. In the year ahead, they will create a detection framework within the Tenet architecture, embedding detection functionality deep into the network. Finally, we mention a collaboration with the University of Wisconsin to study sensor calibration. This project incorporates a novel modeling twist in which smoothness classes (defined via the discrete Fourier Transform or a Wavelet space) are applied to model sensor measurements. From here, a blind calibration scheme is proposed (“blind” in the sense that it does not require a “gold standard”) that has met with practical success.

Deployment Planning and Data Collection. The interviews conducted under the “Data management” project identified the need for pre-deployment planning support. As a result, this group has proposed the CENS Deployment Center, CENSDC, a web tool and service that allows researchers to create, publish and share a pre-deployment plan; and to assemble a post-deployment report documenting in-field experiences. By linking this report with data collected in the field and any subsequent publications, this tool will also enable rich new forms of knowledge transfer. Given the Center’s new emphasis on data and data processing, it should not be surprising that considerable effort has been devoted to a storage platform for sensing data. This year, we reimplemented SensorBase borrowing heavily from models for user-generated content prevalent in so-called Web 2.0 applications like blogging. Each project within SensorBase has its own project page, the description of which (complete with small graphics, “sparklines” of measurement history) can be accessed through a simple search front end. The SensorBase team is working to extend this search to “assets” (find all the temperature sensors) and locations (find all the temperature sensors in Los Angeles), and have produced an initial scheme for signal search (find all the temperature sensors in Los Angeles that experienced at least a five degree drop in the hour before sunrise). SensorBase also supports the Urban Sensing initiatives within CENS, and this connection has spawned a number of enhancements related to privacy and selective sharing of data among groups of users.

Sharing. The “Data management” interviews considered data sharing practices within CENS, specifically examining the characteristics of data being collected, discipline-specific policies for short and long-term access to data, and effective architectural designs (“architecture” as in data formats, protocols for exchange, schema on backend storage). The “Sensor-Internet Sharing and Search” project, one of the FIND groups, has also been actively developing protocols to publish data streams of sensor measurements, collect or aggregate these streams for some kind of analysis, and republish a new stream that might involve some kind of computation. One difficult aspect of this problem has been the propagation of meta-data describing a sensor’s long-term performance (calibration, service history, relationship to nearby sensors). In the coming year, this group will consider how to “position” computation within a pipeline leading from sensors to data consumer.