Skip Header NavigationIntranet 
CENTER FOR EMBEDDED NETWORKED SENSINGContactDirectionsEmploymentEventsNews
HomeAbout UsResearchEducationResourcesPeople

Research Project


Adaptive Sampling by Using a Robotic Boat and a Static Sensor Network

Technology > Multiscaled Actuated Sensing > Adaptive Sampling by Using a Robotic Boat and a Static Sensor Network

On this page: Overview | Approach | Systems/Experiments | Accomplishments | Future Directions | People

Lead Investigators:

Gaurav S. Sukhatme

Overview

Sensor networks provide new tools for observing and monitoring the environment. In aquatic environments, accurately measuring quantities such as temperature, chlorophyll, salinity, and concentration of various nutrients is useful to scientists seeking a better understanding of aquatic ecosystems, as well as government officials charged with ensuring public safety via appropriate hazard warning and remediation measures. Broadly speaking, these quantities of interest are scalar fields. Each is characterized by a single scalar quantity which varies spatiotemporally. A characteristic of such scalar fields is that the sensor readings are only valid locally. That is, the correlation between the sensors at different locations decreases rapidly with increasing distance between the sensor nodes. To estimate the scalar field where no sensor nodes are deployed, we need to interpolate the data. Intuitively, the more the readings near the location where a field estimate is desired, the less the reconstruction error. In other words, the spatial distribution of the measurements (the samples) affects the estimation error.

In many cases, it may not be feasible to move the static sensor nodes after deployment. In such cases, one or more mobile robots could be used to augment the static sensor network, hence forming a sensor-actuator network or a robotic sensor network. Static nodes and mobile robots both have their advantages and constraints. Static nodes generally consume less energy than mobile robots and their batteries last longer. However, static nodes cannot move and cannot change the spatial distribution of the sensor readings while mobile robots are able to move and take sensor readings at different locations.

An immediate question to ask is how to coordinate the mobile robots and the static nodes such that the error associated with the estimation on the scalar field is minimized subject to the constraint that the energy available to the mobile robot(s) is bounded. Specifically, if each static node makes a measurement in its vicinity, and the total energy available to the mobile robot is known, what path should the mobile robot take to minimize the mean square integrated error associated with the reconstruction of the entire field? Here we assume that the energy consumed by communications and sensing is negligible compared to the energy consumed in moving the mobile robot. We also assume that the mobile robot can communicate with all the static nodes and acquire sensor readings from them. Finally, we focus on reconstructing phenomena which do not change temporally (or change very slowly compared to the time it takes the mobile robot to complete a tour of the environment).

We developed a general solution to the above problem and test it on the NAMOS test bed, which consists of anchored buoys (the static nodes) and a robotic boat (the mobile robot) capable of measuring temperature and chlorophyll concentrations.

Approach

Non-parametric regression has been well studied in statistics and many methods have been proposed. We use a local linear regression to estimate the phenomenon. The asymptotic properties of 1 dimensional local linear Regression were first studied by Fan and later extended by Ruppert to the multi-dimensional case. The main take away message from these two studies is that the Integrated Mean Square Error (IMSE) of the estimator is related to the second derivatives of the phenomenon investigated.

If m of Xis the function to be estimated (reconstructed), we assume the following model
Y sub i equals m of x sub i plus the square root of v times x sub i times epsilon sub i,                                                                                                            (1)
where i = 1, ..., n, d is the number of dimensions, x sub iare R to the d-valued predictor variables, y sub iare scalar response variables, v of x sub i equals Var of Y for x equals x sub i is finite and the epsilon sub i are mutually independent and identically distributed random variables with zero mean and unit variance and are independent of x sub i.

Suppose H to the negative one half is the d-dimensional bandwidth matrix and K is a d-variate kernel which satisfies: 1. integral of K of u equals 1; 2. mu two of K times I equals the integral of u times u to the T times K of u du and 3. the integral of u sub 1 to the l sub i to u sub d to the l sub d times k of u d u equals zero for all non-negative integers l sub 1 through l sub d such that their sum is odd, it has been proved that the estimation error associated with the local linear regression is given by the following equation:
MSE of m hat of x and H equals R of K times v of X divided by n times the square root of H times f of x plus one forth of mu sub 2 squared of K times t times r squared of H times H sub m of x plus o sub f times the total of 1 over n times the square root of H plus t times r squared of H              (2)
whereR of K equals the integral of K squred of u d u, f(x) is the density function with the integral of f of x d x equals 1, and H sub m of X is the Hessian matrix of m of x.

In equation 2, when n times the square root of H is big enough and H is small enough, the MSE can be approximated without the infinitesimal. If we assume the Hessian matrix of m of x is known, we can determine the optimal bandwidth and optimal density function by minimizing the Integrated MSE with the constraintthe integral of f of x d x equals 1. By applying the Lagrange-Euler differential equation, we have the optimal density function as follows:
f star of x is proportional to mu sub 2 of K times t times r of H sub m of x all to the 2 d divided by d plus 8 then multiplied by R of K times v of x both to the 4 divided by d plus 9                                                                           (3)
where we assume the bandwidth matrix is defined as H equals h squared times I. f star of x is called the optimal design. When the cost of moving from one sample location to another is small compared to the cost of taking sensor readings, we can use f star of x to generate the sample locations. When samples are to be taken by a mobile robot, optimal design alone is not feasible because now the constraint is not the number of samples to be taken but the distance that the mobile robot can travel. This limits how many samples can be taken and where those samples can be taken.

Assume that initially there are n not readings from static sensor nodesthe pairs from x sub 1 and y sub 1 to x xub n not and y sub n not, the path of the mobile robot passes through the pointsX sub n not plus 1 through X sub n not plus n, and then the optimal path should minimize IMSE, which can be estimated as follows.
IMSE of X sub 1 through X sub n not plus n is proportional to the integral of t times r of H sub m of x times v of x squared squared divided by n squared times f hat squared of x all to the 2 divided by d plus 4 times d x,                                                                (4)

where f hat of x equals 1 over n times the summation from i equlas 1 to n of K sub h of X sub i minus xis the estimation of the density function. Similar to the information gain defined in robot exploration literature, we define the gain for each point as
G of x equals IMSE of X sub 1 through X sub n not minus IMSE of X sub 1 through X sub n not plus 1,
and the gain for a path p as
G of p equals IMSE of X sub 1 through X sub n not minus IMSE of X sub 1 through X sub n not plus n.
Note that the gain of the whole path is not the sum of the gain of all points on the path because the gain at each location is not independent with each other. On the one hand, one new sampling point not only increases the sampling density at/near that point. On the other hand, MSE is not linear in the density function. The higher the initial density function at the location x is, the less the gain that is achieved by taking one more sample at x.

However, if the whole path is taken as the state, the size of the state space is exponential in the available energy and hence is computationally intractable. By examining the effect of dependency closely, we find that the effect is local and the range it affects depends on the bandwidth. Therefore, if we discretize the sensing field sparsely enough, we can approximate the gain of the path by using the sum of the gains of all the points on the path. In this case, the state can be represented by the position of the node, current energy available to the robot, and the gain achieved. Now the search space is linear in both the size of the environment and the energy level and a breadth first search can be applied to find the optimal path.

figure 1a   Figure 1b

(a)                                                                           (b)
Figure 1 (a) The robotic boat (b) The energy consumption model for the robotic boat

To apply an algorithm to search for the optimal path for sampling, an energy consumption model is necessary to determine if two states are adjacent to each other and the energy consumed when the mobile node transit from one state to another. The energy consumption model used is depicted in Figure 1(b). As shown in the figure, from each state, the mobile node is able to transit to 5 other states. We assume that the energy caused by state transition is proportional to the distance the robot traveled. Circular arcs are used to approximate the optimal curves. As a result, the energy consumed if the boat moving forward is 1 unit and all other state transition cost 1.6 units.

Systems/Experiments

Figure 2aFigure 2b
                              (a)                                                                              (b)
Figure 2 (a) The raster scan data set. (b) The interpolated temperature field by using whole raster scan data set.

Our algorithm has been tested with the raster scan data taken by the robotic boat in Lake Fulmor, CA, during a period of 40 minutes in the afternoon of May 9th, 2006. Figure 2(a) shows the points in the entire raster scan data set. The solid curve is the boundary of the sensing field and the black dots are the data points. Temperature readings are used in the tests and Figure 2(b) shows the temperature field interpolated with the whole data set. We mine this data set (the ground truth) as follows to test the algorithm via a data-denial simulation. We assume that the boat is able to follow the optimal path generated by the adaptive sampling algorithm accurately. When one sampling location x is generated, we search in the raster scan data set for the point closest to x and take the corresponding temperature reading as the sensor readings at location x. Once the process of sampling is done, the temperature field is estimated and the result is compared with the raster scan data instead of current temperature on the surface of the lake. As a result, even if the temperature field changes during the time when the robotic boat is doing raster scan, the data set still can be used as taken from a static field. It is also assumed that there exist 15 static nodes to provide the initial data set and the locations of those static nodes are shown as small circles in Figure 2(a). Because we are going to use some of the temperature readings in the raster scan data set as the readings of the static sensors, the locations of the static nodes are picked so that they coincide with some of the points in the raster scan data set.

Figure 3
Figure 3 The square of the trace of the estimated Hessian matrix.

We assume that the boat starts from the top right corner of the lake. The initial direction of the boat is west, i.e., facing toward the center of the lake. Figure 3 shows the square of the trace of the estimated Hessian matrix. The higher the trace, the higher the gain can be achieved by taking a reading there. As shown in Figure 3, the trace of the estimated Hessian matrix is higher in the bottom left part of the lake than in the top right part. As a result, the boat tends to take more samples in the bottom left part. Figure 4(a) and 4(b) show two typical paths generated by the algorithm. The triangles indicate the directions of the boat. The Figure 4(a) shows the path with initial energy 25 units and Figure 4(b) shows the path with the initial energy 50 units. When the initial energy is high, the path planner tends to allow the boat to wander a little on its way to the place with maximum gain so that it does not have to move back later. When the initial energy is low, the path planner requires the boat to go straight to those locations with maximum gain.

Figure 4aFigure 4b

(a)                                                                (b)
Figure 4. Path generated by adaptive sampling. (a) Initial energy is 25; (b) Initial energy is 50.

Figure 5

Figure 5. Comparison between adaptive sampling and random walk.

Our algorithm is compared to a random walk with same initial state. For each method, we generate a set of new data points and then estimate the temperature with the new data set together with the initial readings from the static sensors, which are still part of the raster scan data. Then the estimation is compared to the whole raster scan data set assuming that the raster scan is the ground truth. The IMSE is approximated by summing the square error at each data point of the raster scan. For each initial energy value, the random walk runs for 50 trials and the IMSE reported is the average over these 50 trials. Figure 4 shows the comparison of the IMSE between the two methods with the initial energy being varied from 5 to 60 units. From the figure, we can see that with the adaptive sampling algorithm, the IMSE is about 20% less than a random walk. Since IMSE is proportional to n to the two thirds for the best case in 2D problem, a 20% improvement is significant.

Our algorithm has also been tested on the boat in a lake in Echo Park in Los Angeles and the target scalar field is the temperature field. Once again, we use the raster scan data as the ground truth. The boat was first used to quickly perform a raster scan in a 40m x 70m area, which is part of the lake. We assume that the temperature field did not change significantly during the period when the experiments are carried out. Figure 6 shows the results of one experiment. The solid lines in Figures 6(a) show the boundary of the test area and the circles are the locations where initial temperature readings are available.

The boat starts from the right bottom corner facing north in the first experiment and top right corner facing west in the second experiment. The triangles in Figure 6(a) are the planned locations where the boat should go and take temperature readings and the dots show the GPS record of the boat, which show the actual track. Note that the actual track did not follow the planned path strictly. This is due to the GPS resolution and the effect of the wind. Currently, there is no battery monitor on the boat and hence we cannot measure the energy consumed by the boat directly. However, during the experiments, the speed of the propeller on the boat is approximately constant. That is, the thrust generated by the propeller is approximately constant and hence we can approximate the energy consumption by measuring the distance the boat traveled. With the dense GPS readings along the track, it is easy to compute the distance. Figure 6(b) show the IMSE of the estimation vs. the distance the boat traveled in both experiments respectively.
Figure 6aFigure 6b
(a)                                                                         (b)
Figure 6. (a) shows the planned path and the GPS trace of the actual track the boat took. (b) shows estimation error v.s. distance the boat traveled (the energy consumed by the boat is approximately proportional to the distance traveled).

Accomplishments

We have proposed an adaptive sampling algorithm for a mobile sensor network consisting of a set of static nodes and a mobile robot tasked to reconstruct a scalar field. Our algorithm is based on local linear regression. Sensor readings from static nodes (a set of buoys) are sent to the mobile robot (a boat) and used to estimate the Hessian Matrix of the scalar field (the surface temperature of a lake), which is directly related to the estimation error. Based on this information, a path planner generates a path for the boat such that the resulting integrated mean square error (IMSE) of the field reconstruction is minimized subject to the constraint that the boat has a finite amount of energy which it can expend on the traverse. Data from extensive traverses in the field as well as simulations validate the performance of our algorithm.

Future Directions

We are currently working on how to determine the appropriate resolution to discretize the sensed field. One interesting observation from the simulations and experiments is that when the initial available energy is increased, the estimation errors decrease rapidly and level off instead of decreasing to zero. Theoretically, when the energy available to the mobile node increases, more sensor readings can be taken and hence the estimation errors should keep decreasing. By examining the path generated by the adaptive sampling algorithm, we found that when the initial energy is enough for the mobile node to go through all the 'important' locations, increasing the initial energy does not have much effect on the estimation error. We plan to investigate advanced path planning strategies and alternative sampling design strategies in future work.

People

Faculty:

Staff:

Graduate Students: