Deep Ocean Bubble Tracking

Jason Rife

EE368 Final Project

Robotic cameras can be used to track gas bubbles in the deep ocean. A vision algorithm is described that permits this task. The vision algorithm is tailored for a structured test environment, open to the ocean. The algorithm uses a differencing filter robust to reflective surfaces. Differenced images are segmented to find likely bubble locations. Locations are compared over time to determine most likely bubble path. The algorithm produces an actuation signal to instruct the robotic camera to recenter the target bubble.

1. Introduction

By storing greenhouse gases in the deep ocean our industrial society may slow the rate of global warming. The high pressure conditions in the deep ocean permit an unusual phase mixture of water and gas that is more dense than the surrounding sea water. This hydrate phase sinks toward the bottom of the ocean, trapping the gas and preventing it from re-entering the atmosphere. If environmental impact studies demonstrate that these methods are not detrimental to the marine habitat, then cost becomes the only factor limiting deep water release of CO2 and CH4.

Understanding the transport physics associated with deep ocean gas release is a subject of current research interest. In particular, marine scientists seek to learn at what rate gas bubbles are absorbed into ocean water, and at what rate the bubbles travel through the water column. These questions can be answered experimentally by releasing small bubbles in the deep ocean and by tracking those bubbles with a robotic camera. To fully automate the bubble tracking process, a vision algorithm is required.

 2. Field Experiments

Bubble tracking will be conducted from the Ventana, a remotely-operated vehicle (ROV) maintained by the Monterey Bay Aquarium Research Institute (MBARI). The Ventana ROV, which is capable of descending to depths approximately 2000m, is pictured below. The vehicle is piloted via tether from a control room on its mothership, Point Lobos.

(Figure 1)

Bubbles are released inside a test section mounted on Ventana, directly in front of the vehicle's HDTV camera. Measuring 1m in height, 0.3m in width, and 0.15m normal to the camera, the test section is open on top and bottom to permit bubble motion independent of the ROV. Bubbles are released at a depth of 800m and must be followed to a depth of 400m. During the ascent, gas is absorbed from the bubble into the surrounding water, causing the bubble mass to decrease. Bubble speed is approximately 0.1 m/s in the vertical direction. To obtain the required bubble data, Ventana must track a bubble for over an hour, a daunting task for a human pilot.

The automated bubble tracker places this burden in the hands of a computer, an alternative pilot well suited for high precision endurance tasks. Early design of this control system has been completed off-line in a laboratory setting.

Video footage taken from Ventana during proof-of-concept runs is useful for design of the automated control system. The vision algorithm described in this paper makes use of two video tapes recorded earlier this year with the ROV running under human control. These tapes have been provided courtesy of Peter Brewer, a MBARI expert in the field of Ocean Chemistry.

The geometry of the test section is such that the bubble moves primarily in the image plane. The narrow depth of the test section constrains the bubble from substantial motion parallel the camera optical axis. This constraint turns a three-dimensional problem into a two-dimensional problem, a fact convenient for the visual tracking system.

 3. Existing Vision Algorithms

The Vision literature is rife with methods for change detection, shape identification, and object tracking. Choosing the appropriate methods applicable to bubble tracking is not a trivial task. Most tracking algorithms in the literature fit into the structure of the following chart. 

(Figure 2)

The processing block extracts critical information from the original image. If, for instance, the tracking target is significantly brighter than the background, raw image intensity may be used. For other targets, intensity gradient or color may be better choices. If the target is moving, much of the critical information can be found in pixel changes between images.

The segmentation block organizes the critical information by grouping clusters of related pixels. The most commonly employed segmenting technique is thresholding. As an alternative to thresholding, snakes find clusters of related pixels without requiring a predetermined level set. The snake equation acts like a momentum balance that "forces" the elastic shape of the snake to wrap around contours where image gradients are steep.

The matching block scans the possibilities recognized during segmentation. If one of the possibilities is a strong match for the target, a control signal is output corresponding to the identified target location. If no strong match is found, the matching block must output a "no confidence" signal, which the control algorithm will interpret accordingly. Local statistics, like target area or second moments, work well for general matching. For moving targets, proximity to an expected target location is a useful matching metric.

It should be noted that the statistical segmentation techniques, including correlation and genetic algorithms, produce a ranked list of outputs. Rank is based on quality of the correlation or genetic match. This rank can be used directly to identify the most likely target, effectively combining the segmentation and matching tasks.

4. Designing A Vision Algorithm

Two key criteria govern the selection of image processing techniques. The first criterion is the ability to locate a bubble. The second is the ability to run in real time. The options discussed thus far are generally applicable in real time implementations, so the ability to locate bubbles is the focus of the design decision.

 4.1. Choosing a Processing Block:

What key visual information indicates the presence of a bubble? Still frame images captured from video demonstrate that bubbles cannot easily be located in individual images. Bubble intensity and color are very closely related to that of the background. Intensity variations within a typical image are much greater than the variation between a bubble and the background. This difficulty is apparent in the image below.

 

(Figure 3)

Despite the difficulty of observing the bubble in a still image, a human observer more easily recognizes bubble location from video. Applying this observation about human perception, one might suggest quantifying image changes or image motion as a means for bubble detection.

 4.1.2 Optical Flow

Previous workers have used optical flow as a convenient means to compute motion within an image. Optical flow maps subsequent images into a vector field that corresponds to motion of pixel groups. Optical flow vectors should be parallel and of the same magnitude in the vicinity of a moving object, and nearly zero for all background locations. The simulated diagram below represents optical flow for an object moving across a still background.

 

(Figure 4)

Many optical flow formulations exist. One of the simplest ways to implement optical flow is to compute the local velocity vectors using Equation 1.

(Eq. 1)

In this equation, I is image intensity, v is the two element optical flow velocity vector, and S is a source term. This is a single equation for two unknowns, the two elements of v. In practice, the equation can be solved in a least squares sense for several pixels in a region. The solution of the linear least squares problem is efficient, particularly if the window is small (9 pixels, for instance).

The largest problem with the optical flow calculation results from modeling of the source term, S. For many applications, it is sufficient to set S=0. This assumption is appropriate when viewing Lambertian surfaces with constant lighting. The bubble, however, is a complicated, transparent object. Intensity is definitely not conserved, and so S cannot be zero. Because of the difficulty of modeling this term, optical flow is inconvenient for bubble tracking.

Optical flow is further hindered by high bubble speeds. To make accurate derivative computations in optical flow, small differences are required between successive images. When the bubble moves quickly, it is necessary to coarsen the base image before computing optical flow. (Nagahdaripour, 2000) For bubble tracking, it becomes difficult or impossible to find the target in a coarse image.

 4.1.3. Change Detection

The human's ability to observe the moving bubble suggested implementing change detection or motion detection. Since motion detection via optical flow presents severe challenges, change detection was considered.

Change detection algorithms compare subsequent image pairs to locate differences between images. The simplest change detection methods simply difference two images. The most severe problem with this change detection method is its lack of robustness under variations in lighting conditions. In the atmosphere, for example, a cloud that moves in front of the sun may cast a shadow onto a scene. Since the shadow does not represent a structural change to the scene, a robust change detection algorithm will not show the shadow as a region of change.

The review paper of Skifstad discusses several methods for change detection. (Skifstad, 1989) Among these methods, Skifstad recommends two: the derivative model and the shading model. The derivative model uses the 1-norm of the intensity gradient vector as a basis for comparison between images. In the notation, below, the measure of the difference is D. D is computed over a small region A, using values of intensity, I, from the original image.

(Eqs. 2&3)

The shading model method uses a different approach. This model assumes that the surface albedo for all points in the image remains constant and that only the light incident on these surfaces changes over time. Assuming that the light intensity leaving a point, I, can be expressed as the product of a surface shading coefficient, S, and the incident intensity, L:

(Eq. 4)

In regions where the surface shading (albedo) does not change, R will be approximately constant, consequently its variance will be nearly zero. Where objects move across the background image, R will vary across a region. Consequently the variance of R computed over an area will be non-zero for regions in which objects in the scene change, independent of scene lighting changes.

To evaluate which, if any, of these change detectors works successfully, the derivative and shading models were applied to several test images.

(Figure 5)

In this figure, Image A represents an unfiltered still image captured from video. Image B is a straight difference of the subsequent stills. Image C uses the derivative model, with derivatives computed using the Sobel operator. Image D shows the results of the shading model computation. Image E depicts a customized change detector described below. For this particular case, it is clear that the straight difference and the shading model both give poor results. The derivative model and the customized detector, on the other hand, emphasize the bubble (which may be observed as a salient speck at the centers of images C and E).

The one limitation of the derivative model is the noise in the region of the ruler (the vertical white stripe in Image A). This noise is quite strong. In fact the ruler noise in image C has a much larger magnitude than the bubble signal. It is desirable to eliminate this noise.

The ruler is a plastic surface with directional reflectance. Such non-Lambertian surfaces are highly sensitive to slight motions of the light source (vibrations in the light support structure, for instance). Multipath effects (scattering from particles suspended in the seawater) can also create directional lighting variations through time. Similar reflectance noise is observed near metal bolts and edges of the plexiglass test section.

To reduce reflectance noise, the customized method calculates the standard deviation of each pixel over a small population of uncorrelated images (20 or so non-consecutive images). Differences are then normalized by (1+s )-1 where s is the local standard deviation. This normalization greatly enhances bubble differences relative to the background. The customized filter is represented by the following block diagram.

 

(Figure 6)

 

4.1.4. Processing Block -- Implementation Issues

Several engineering issues arose during implementation of these filters. Two of the most important implementation issues were choice of the reference image and removal of the vertical mean.

In Section 4.3, change methods were described as comparisons of subsequent images. Because the bubble tracking experiment occurs against a fixed background, it is possible to compute a representative background image. The background image is computed by averaging uncorrelated (non-consecutive) images in the absence of the bubble. This averaging process smooths reflectance noise. More importantly, using the background image solves an ambiguity problem. If subsequent images are differenced, the bubbles in each image create changes in the differenced image. Thus two blobs appear in the differenced image, one corresponding to the current bubble position, one to the previous bubble position.

It is occasionally necessary to recompute the background image. For example, if the robot camera zoom setting is changed, the old background image becomes invalid. It will be important to consider this fact when the algorithm is implemented on the field robot. Also, the test cell can translate in severe ocean turbulence. This occurs rarely, but was observed for approximately one minute during the 50 minute MBARI video tape. It may be necessary to implement an image registration subroutine to realign the background image during turbulence events.

The removal of the vertical mean proved to be an important procedure as well. The video recording system introduced a different DC gain for each vertical column of the recorded images. This artificial column gain changed over time, so it was important to remove it prior to image processing. Removing the column DC component has the side effect of removing the entire image's DC component. This important step mitigates lighting variation effects, to a limited degree.

 4.2. Segmentation Block

A standard thresholding technique was implemented in the segmentation block. The processed image was binarized, with a threshold set according to the image histogram's 3-standard deviation point. Thus 99% of the binary image was set to zero, and only 1% was set to unity. A blob finding method was implemented to compute the area and centroid of these unity regions. (Horn, 1986.) The processing and segmentation procedure are depicted together in the following illustration.

(Figure 7)

A matched filter process was also considered for the segmentation block. Unfortunately, the bubble looks extremely different from image to image, lighter than the background in patches, darker in other patches. Because of the variability of this translucent target's appearance, it was not possible to find an appropriate bubble template for use in a matched filter correlation scheme.

 4.3. Matching Block

The output of the segmentation block is a list of blobs, their locations, and their areas. This database serves as an input to the matching block.

The matching block attempts to connect blobs through time. The segmentation routine identifies multiple blobs. These blobs may be bubbles, organic matter floating in the water column, or noise from reflective surfaces. The matching block uses motion heuristics to separate bubbles from artifacts.

To this end the matching block builds structures called chains. Each chain consists of a trio of blobs from successive images, as well as data related to those blobs as described by the structure outline, below.

 

 

Five functions operate on the class chain: initiation, propagation, recombination, termination, and locking. These functions are loosely inspired by the theory of reaction mechanisms, a standard topic in the field of chemical kinetics.

4.3.1. Initiation Function

In practice, the matching block instantiates and tracks three distinct chains at all times. Each time a new image is acquired, the initiation function is called to refresh all empty chains. Each new chain is chosen so as to not duplicate an existing chain. Subject to this constraint, chain are initiated based on the minimization of a cost function.

Cost functions are computed for blob combinations across a set of three images. The combination with the lowest cost is selected for initiation.

If there are a large number of blobs in an image, the initiation block may only consider a finite number of the best blobs from each image. (Best blobs are chosen based on blob area alone.) In practice, only four blobs were considered from each image, resulting in 64 possible blob combinations. The cost function is computed as:

(Eq. 5)

The cost function measures how well the third bubble of each combination compares to the first two, both in area and in expected location. The area cost, JArea, becomes large when the area of the third bubble, A3, is not equal to the average area of the first two bubbles.

(Eq. 6)

The location cost, JLocation, becomes large when the third bubble's location, x3, is far from the expected location. The expected location derives from a first order approximation of bubble velocity.

(Eq. 7)

The error between actual and expected blob location is normalized by the distance between the first two bubbles. Since this distance can be zero, the normalization factor is clipped to a minimum value. In practice M was set to .01 (pixels units).

(Eq. 8)

Chain score is initialized to unity for new chains, and the chain lock boolean is set to false.

4.3.2. Propagation Function

At each time step the propagation function is called to adjust the score of a particular chain. Chain scores are increased or decreased to recognize the persistence of chains through time. Artifacts that result from noise may match well over a three image set, but tend not to match over longer periods of time. Scores recognize this fact and reward chains that maintain low cost functions over time.

The cost function for each chain is computed using the formula from section 4.3.1. and compared to costs computed by the initiation function. If the chain's cost is worse than the costs to which it is compared, the chain's score is decremented by 1. Otherwise the chain score is incremented by 1 (to a maximum of 5).

When the score is incremented, the elements of the chain are updated by removing the earliest blob in the chain and adding a new blob at the end of the chain. If the chain is decremented, no new blob is added to the chain.

4.3.3. Recombination Function

Chains, initially distinct, may become identical over time. The recombination function is called to recognize duplicate chains. The duplicate chain with the lower score is set empty.

4.3.4. Termination Function

The termination function recognizes chains whose score has dropped to zero. These chains are set empty.

4.3.5. Lock Function

The lock function selects which chain is used to create an error signal for robot control. Lock is turned on (set to true) for the first chain which reaches a score of 3. Lock is terminated only when the locked chain is set empty.

Thus if the algorithm temporarily fails to find a suitable match to extend the locked chain, the chain remains locked. Control continues to function smoothly using the error signal corresponding to the last reported bubble position.

5. Results

Several test sequences were examined, of which three were recorded on video tape for the class presentation. These three sequences are summarized here.

The first demonstration sequence shows a single bubble against a flat background. The bubble tracking sequence locks on quickly and tracks the bubble through all remaining frames.

The second demonstration sequence begins with a single bubble in the field of view. The background is not flat, but contains reflective surfaces including a plastic ruler and metal bolts. Late in the image sequence a second bubble appears, approaching the original bubble and then moving away. The automated tracker successively holds lock on the original bubble through the entire sequence.

The third demonstration shows a case when the automated bubble tracker breaks down. The image sequence begins with two bubbles dancing side by side in the vicinity of the plastic ruler. The tracker has difficulty distinguishing the two bubbles, and once mistakes a ruler reflection as a bubble. The bubble tracker recovers from this mistake, and continues following the bubble pair until the end of the sequence. Were the two bubbles to separate, the bubble tracker would lock on to one or the other at random.

 6. Future Directions

In order to implement a feedback algorithm, it will be necessary to model the transfer function between actuation and bubble motion. The blob location estimator (section 4.3.1.) must be adjusted to take actuation into account. The new estimator will have the following form, where u is the control output, t is time, and f is the appropriate transfer function.

With small adaptations, this algorithm may be useful for jellyfish tracking, another activity of interest to MBARI researchers. For jellyfish tracking the two dimensional constraint will not be valid. It may be possible to determine jellyfish depth from visual information (depth from focus, depth from stereo, depth from motion) or from a secondary sensor (a laser rangefinder).

Test section configuration should be modified for future experiments. The vision system could be made more robust by removing reflective surfaces from the test section. Also, a set of markings might be added to the test section to permit image registration during brief periods of turbulence.

 7. Conclusions

A three-block vision algorithm has been implemented for off-line bubble tracking. The first block, dubbed the processing block, enhances the image. The second block, dedicated to segmentation, carves the enhanced image into blobs. These blobs are compared by the third block, a matching module. The matching module outputs the location of the most likely target bubble. This information is intended as an error signal for control purposes. A real time version of the algorithm will be implemented in the next month for ocean testing.

 

References: