Skip Header NavigationIntranet 
CENTER FOR EMBEDDED NETWORKED SENSINGContactDirectionsEmploymentEventsNews
HomeAbout UsResearchEducationResourcesPeople

Research Project


Data Integrity In Sensor Networks

Technology > Statistics and Data Practices > Data Integrity in Sensor Networks

On this page: Overview | Approach | Future Directions | People

Principal Investigators

Mani Srivastava, Mark Hansen, Greg Pottie

Overview

Sensor networks have been increasingly used environmental monitoring. In a deployed sensor network, sensor readings will often become faulty throughout the deployment. There may be physical sensor problems such as calibration issues or power supply problems that may cause non-normal sensor readings. There may also be other additional unforeseen issues with sensors. Our goal is to identify problems within the sensor network and determine the degree of confidence we have in our collected data. To ensure accurate data collection, it is important to have on-line detection of faulty sensors such that appropriate action may be taken in a timely manner to remedy a sensor issue. The Data Integrity group has pursued a broad range of research in the past year. The following represents the work of Kevin Ni.

Approach

In order to accurately determine sensor faults there are several issues that must be resolved.

We can then take our updated models and reapply our mechanism of determining conformal sensor behavior. If need be, these updated models can also trigger updates in our mechanism.

In [1] we work to address some of these issues of determining models and expected behavior. A faulty sensor is a sensor with readings that are not consistent with what is actually occurring. Since we are unable to ascertain the true behavior of the environment without a fully trusted node, we resort to using the agreement among a smaller group of sensors. We use a type of agreement problem combined with a Bayesian selection method in order to judge ground truth. In essence we break down the problem into two steps. In the first step, we use Bayesian detection to decide what subset of sensors we choose to believe best represents the data trends. From this subset of sensors we can construct a model of what the expected behavior should be, i.e. this model shows us a "big picture" of what should be occurring with each sensor. In the second step we judge whether or not sensors are faulty based upon the model developed from this subset.

We select a Bayesian approach because this allows for the inclusion of background and prior knowledge in the decision of our selection. Also, it allows for updates to our prior information using previous decisions. We are able to use the posterior probability from the previous decision as a prior probability for our next decision. Also, this approach allows us to assigns a probability or likelihood to a sensor being faulty. This allows for more leeway in how sensors can behave.

In our initial attempt to resolve some of the modeling questions discussed earlier, we have had some success in detecting faulty sensor behavior. We have used a MAP approach to determine a subset of sensors from which we develop a model of expected behavior of the data trends for all sensors. This model provides us with away of flagging a sensor that needs repair immediately while the sensor network is deployed.

We apply our algorithm to data from four temperature sensors deployed across a valley [2]. These sensors are placed at varying altitudes in a valley measuring cold air drainage temperatures. Therefore we do not expect that all temperatures be the same. However, we do expect that overall temperature increases and decreases occur at all sensors at similar times. The first plot in figure1 shows the data that was collected from the deployed sensors. We see that there is a clearly faulty sensor, sensor 4, and the other sensors have data that generally move together.

Figure 1

Figure 1: Sensor data collected from deployed sensors and sensors including in the agreeing subset

Table1 shows the correct marking for a faulty sensor occurred 75.89% of the time. While for individual sensors 1 and 3 we remain under PFA =0.05, sensor 2 does not. Sensor 2 exceeds the design constraint by 0.0296. As this sensor is not, for the most part, in the agreeing subset and not involved in the model development, we expect that it might be marked as faulty more often, however this is higher than we designed for. Looking at overall performance the total false detection rate is 0.1176 which is higher than we would like.

Table 1: Results for Actual Data

Sensor

Proportion Marked as Faulty

1

0.0186

2

0.0796

3

0.194

4

0.7589

Future Directions

In our current algorithm, we characterize sensors as either faulty or non-faulty. We would like to allow for more leeway in our decisions, and we seek to be less deterministic in our classification of faulty nodes. This allows us to judge the confidence we have in a particular sensor and this can then play a role in any final inference made about the phenomenon. This allows us to set our goal to be accurately inferring a dynamic physical model of our phenomenon based upon data collected from sensors. If we have a probabilistic characterization of sensor behavior, then this influences our confidence in our model. If we would like to model our phenomenon to within a set fidelity criterion, we will need to know to what degree of accuracy the sensors are measuring rather than, completely correct of completely false as we have in our current algorithm.

A sensor may behave correctly for the most of its lifetime, however there may be significant noise, or just a significant amount of faulty readings. We can give a confidence level for this particular sensor that represents how often these faulty readings occur. From this, we can express how much we trust our data from any sensor and give a confidence on our final dynamic physical model.

People

Faculty:

Graduate Students: