Technology > Statistics and Data Practices > Anomaly Detection
Leana Golubchik, Ramesh Govindan
Various sensor network measurement studies have reported instances of transient faults in sensor readings. In this work, we seek to answer a simple question: How often are such faults observed in real deployments? To do this, we first explore and characterize three qualitatively different classes of fault detection methods. Rule-based methods leverage domain knowledge to develop heuristic rules for detecting and identifying faults. Estimation methods predict “normal” sensor behavior by leveraging sensor correlation, flagging anomalous sensor readings as faults. Finally, learning-based methods are trained to statistically identify classes of faults. We find that these three classes of methods sit at different points on the accuracy/robustness spectrum. Rule-based methods can be highly accurate, but their accuracy depends critically on the choice of parameters. Learning methods can be cumbersome, but can accurately detect and classify faults. Estimation methods are accurate, but cannot classify faults.
We apply these techniques to four real-world sensor data sets and find that the prevalence of faults as well as their type varies with data sets. All three methods are qualitatively consistent in identifying sensor faults, lending credence to our observations. Our work is a first-step towards automated on-line fault detection and classification.
We focus on a small set of sensor faults that have been observed in real deployments: single-sample spikes in sensor readings (we call these SHORT faults), longer duration noisy readings (NOISE faults), and anomalous constant offset readings (CONSTANT faults). Figure 1 displays these faults in sensor measurements collected during different sensor network deployments.



Figure 1: CONSTANT fault (left), SHORT fault (Middle), NOISE fault (right)
We explore three qualitatively different techniques for automatically detecting such faults from a trace of sensor readings.
Rule-based methods leverage domain knowledge to develop heuristic rules for detecting and identifying faults. For example, a simple heuristic rule for detecting NOISE faults can be as follows:
NOISE Rule: Compute the standard deviation of sample readings within a window of N samples. If it is a above a certain threshold, the samples are corrupted by the NOISE fault.
Linear Least-Squares Estimation (LLSE) based methods predict “normal” sensor behavior by leveraging the spatial correlation between readings from different sensor nodes, flagging deviations from the normal as sensor faults.
Finally, Hidden Markov Model (HMM) learning-based methods are trained to statistically detect and identify classes of faults. The states in an HMM model mirror the characteristics of both the physical phenomenon being sensed as well as the sensor fault types. For example, based on our characterization of faults (Figure 1), for a sensor measuring ambient temperature, we can use a 5 state HMM with the states corresponding to day, night, SHORT faults, NOISE faults and CONSTANT faults. Such an HMM can capture not only the diurnal pattern of temperature but also the distinct patterns in the reported values in the presence of faults. We flag a sensor reading as erroneous if the state assigned to it (by the HMM model) corresponds to one of the fault states.
Injected Faults: By artificially injecting faults of varying intensity into sensor datasets, we are able to study the detection performance of these methods. We find that these methods sit at different points on the accuracy/robustness spectrum. While rule-based methods can detect and classify faults, they can be sensitive to the choice of parameters. By contrast, the estimation method we study is a bit more robust to parameter choices but relies on spatial correlations and cannot classify faults. Finally, our learning method (based on Hidden Markov Models) is cumbersome, partly because it requires training, but it can fairly accurately detect and classify faults. We also explored hybrid detection techniques, which combine these three methods in ways that can be used to reduce false positives or false negatives, whichever is more important for the application.
Real-World Data Sets: We applied our detection methods to real-world data sets. Here, we present results from the Great Duck Island (GDI) data set where we examine the fraction of faulty samples in a sensor trace. The predominant fault in the readings was of the type SHORT. We applied the SHORT rule, the LLSE method, and Hybrid(I) (a hybrid detection technique) to detect SHORT faults in light, humidity and pressure sensor readings. Figure 2 shows the overall prevalence (computed by aggregating results from all the 15 nodes) of SHORT faults for different sensors in the GDI data set. (On the x-axis of this figure, the SHORT rule's label is R, LLSE's label is L, and Hybrid(I)'s label is I.) The Hybrid (I) technique eliminates any false positives reported by the SHORT rule or the LLSE method. The intensity of SHORT faults was high enough to detect them by visual inspection of the entire sensor readings time series. This ground-truth is included for reference in the figure under the label V. It is evident from the figure that SHORT faults are relatively infrequent. They are most prevalent in the light sensor readings (approximately 1 fault every 2000 samples).
Figure 2: Great Duck Island Data Set
In summary, during this reporting period, we:
We believe that our work opens up new research directions in automated high-confidence fault detection, classification, data rectification, and so on. More sophisticated statistical and learning techniques than we have presented can be brought to bear on this crucial area. We plan to develop an online, automated sensor fault detection framework and integrate it with the existing sensor network architecture, such as TENET, over the next year.
Faculty:
Graduate Students