Junjie Wang, Haixin Tie
Department of Electrical
Engineering
Stanford University
1.
Introduction
In this project, we will investigate the application of nonlinear noise reduction techniques presented in literatures. Specifically, we will focus on two approaches, namely wavelet-transform method and local information-based method. We have implemented both algorithms, the performance is studied under different noise patterns. The first approach, wavelet-transform denoising, shows its power under Gaussian and speckle noise; whereas the second technique, local contrast entropy maximizing method, presents its good performance under the salt&pepper noise.
For the transform-domain denoising approach, we continue our efforts to propose an innovative noise level estimation algorithm and a blind threshold searching scheme, mainly to tackle the problem of unknown noise variance and non-Gaussian noise models respectively. Accurate estimate and good denoising performance are obtained. For the local information-based approach, we re-examined the local contrast entropy defined by Beghdadi and Khellaf [5], and the optimal threshold value is adjusted based on both theoretical analysis and experimental results. We propose that this algorithm should have the best performance with respect to discrete noises, in which noisy pixels are mostly disjoint from one another. The improved algorithm is applied and the resulting performance is compared with that of the median filter.
This project report is organized as follows. Section 2 is dedicated to the study of wavelet-transform-domain method. We start with the theoretical background of application of Discrete Wavelet Transform (DWT) with thresholding to the image denoising, with optimal threshold proved by Dononho and Johnstone [1] for AWGN. Experiment results obtained by using Redundant Discrete Wavelet Transform are demonstrated in Section 2.2. Typical questions raised on this methodology are then “what if the noise variance is unknown?” or “what if the noise is non-Gaussian?” For the first question, an innovative noise level estimation algorithm is proposed and extremely accurate estimate is shown in Section 2.3. For the second question, we develop blind threshold searching scheme and apply it to the speckle noise pattern in Section 2.4, the results demonstrate the good performance of this technique. Section 3 focuses on the study of the local information-based noise reduction technique. Section 3.1 first introduces necessary definitions and the hypothesis of maximizing local contrast entropy, and then the choice of the best threshold value is studied. Following that is the description of this local contrast entropy based noise reduction algorithm. Section 3.2 then experiments the algorithm on the standard 256 x 256 x 256 gray level Einstein image and the resulting image is compared with that of a median filter both objectively and subjectively.
2. Wavelet Transform Based Denoising Techniques
2.1
Selective Wavelet Reconstruction
Wavelet theory has been extensively studied in recent years as a promising tool in image compression and noise reduction [1-4]. In their papers [1,2], Donoho and Johnstone developed a theoretical Discrete Wavelet Transform (DWT) framework for the estimation of signals corrupted by additive white Gaussian noise (AWGN). A principle named selective wavelet reconstruction is proposed and it is shown to yield asymptotically optimal estimates for a wide variety of signals.
Suppose we have a corrupted signal
yi = xi
+ ni
where ni are
i.i.d. Gaussian noise
. The objective is to reconstruct the optimal estimator
for desired signal xi, which
yields the minimum mean square error.
The scheme of selective
wavelet reconstruction is illustrated in Fig. 1. The corrupted image is
first transformed to a set of wavelet coefficients, that is
, where
and z are the
coefficients corresponding to the desired signal and the noise. A thresholding process is then applied to
the wavelet coefficients, i.e.
, where t is the threshold value. Finally, an inverse
transform on the thresholded coefficients yields the estimator
.

Figure 1: Block diagram for DWT-based denoising scheme
There are two threshold rules proposed in [1,2]:
The intuition behind this approach is that the neighboring pixels exhibit high correlation, which translates to only a few large wavelet coefficients. On the other hand, the noise is evenly distributed among the coefficients and is generally small. With a properly chosen threshold value, the noise can be efficiently suppressed. It is proved in [2] that the optimal threshold value is
, (1)
where N is the block size in the wavelet
transform. Hard threshold rule usually leads to small mse where
soft-threshold can maintain the smoothness of the original signal. Both
threshold rules will be implemented in our project.
2.2
Redundant Discrete Wavelet Transform
In our algorithm, we choose N=8 and we apply either
Haar Transform (HT) or Discrete Cosine Transform (DCT) as the transform basis.
Strictly speaking, DCT is not a wavelet transform, it is chosen because of its
good energy packing property so that the desired signal resides in only a few
positions in the transform domain, thereby leading to smaller thresholding
artifacts. Experiments show that the two transforms yield comparable
performance. Fig. 2 shows the results from RDWT approach, where the original
image is corrupted by AWGN with normalized variance
(defined as the signal is scale to the range of (0,1)). Hard
and soft threshold rule are both applied. Note that in [2] the optimal
threshold value for soft threshold rule is derived under a minimax criterion,
that implies the possibility of improve the denoising performance by using a
different threshold value for a specific image and transform. In the
experiment, the threshold value
is used.

(a) Original image
(b) Image corrupted by AWGN
(
)

(c) HT-based denoising using (d) HT-based denoising using
hard threshold rule, mse=98.6 soft threshold rule, mse=102.0

(e) DCT-based denoising using (f) DCT-based denoising using
hard threshold rule, mse=88.1 soft threshold rule, mse=104.2
2.3 Noise Level Estimation
First, we introduce a Hypothesis of “Noise Removal Saturation” motivated by the following observations:
· By thresholding, we are killing high-frequency components mainly from noise.
· When we increase the threshold value, the removed “noise” level will also increase (it actually consists of noise and high frequency components from the image), but there will be a point where the high-frequency signal components are “exhausted”.
· At this point, the removed noise level fails to keep up with the rate of increasing estimated threshold, i.e. it enters the saturation region.
· This point corresponds to a good estimation of true noise level.
This hypothesis is well verified in empirical experiments. As we vary the threshold (recall that the threshold is proportional to the noise level for AWGN case) to denoise the image, the standard deviation of removed “noise” (consisting of noise and high frequency signal components) is calculated. Fig. 3(b) reveals a clear transition of the dashed line. That is, when the threshold is well below the optimal value, the removed noise level increases steadily with the threshold, which is referred to as a linear region. However, it starts to saturate at a certain point, i.e. it enters a saturation region. This transition point perfectly matches the optimal threshold value (and consequently the true noise level). As a result shown in Fig. 3(a), this transition point corresponds to the best output image with the smallest mse. Therefore, our noise level estimation boils down to the problem of finding such a transition point by varying the threshold in the case where the noise intensity is unknown.

Figure 3: Empirical verification of “Noise Removal Saturation”
Hypothesis
Based on the hypothesis, we propose an iterative
algorithm for noise level estimation, which is graphically illustrated in Fig.
4. The basic idea is that, we start with two sets of threshold a
and b so that they lie on
two ends of the curve in Fig. 4. Then we move both thresholds by a small amount
towards the center, for each successive pair of (ak-1, ak)
and (bk-1, bk), we obtain a set of
standard deviation of the removed noise
and
under corresponding thresholds. By linear interpolation on
both sets, we obtain an intersection point. This procedure is repeated until
the two successive intersection points differs within a given accuracy range.
As shown in Fig. 4, a trajectory of point 1 to 2 to 3 to 4 and so forth is
swept until it finally converges to a certain point, namely the estimate of the
noise energy.

Figure 4: Noise level estimation algorithm
The algorithm is summarized as follows:
1.
Choose small
and large
as the initial
threshold so that they lie on the two piecewise sections of the curve. Choose
an accuracy requirement
and a step size
. Denoise the image using both thresholds and obtain the
standard deviation of the removed noise A0 and B0. Set
k=1.
2.
Let
,
, do HT- or DCT- based denoising using hard threshold rule
and calculate Ak and Bk. Find the intersection point
of the line1 extrapolated with {Ak-1, Ak}
and line2 extrapolated with {Bk-1, Bk}.
3.
If
, set k=k+1, go to step 2; otherwise stop with the estimate ![]()
The algorithm is tested on two images corrupted by AWGN
with
=0.1 and 0.145. The estimates are tabulated in Table 1. Both
HT- and DCT-based estimator yields excellent accuracy of approximately 1% and
6% respectively.
|
True value |
HT-based estimator |
Accuracy
|
DCT-based estimator |
Accuracy
|
|
0.1 |
0.1004 |
0.4% |
0.1013 |
1.3% |
|
0.145 |
0.1373 |
5.0% |
0.1366 |
5.8% |
Table 1: Noise level
estimation
2.4 Blind Threshold Searching with Non-Gaussian
Noise
There are a variety of noise model in practice, which
deviates far from the AWGN case. For example speckle noise operates on the
image with multiplicative nature, that is, y=x(1+n) . Even the Gaussian
noise could be distorted by the clipping in digital processing, especially
under high noise intensity. As mentioned above, the optimal threshold value
derived in [1,2] is only applicable to AWGN, leaving the non-Gaussian case
unclear in choosing the correct threshold value. In this subsection, we propose
a blind threshold searching scheme based on the Hypothesis of “Noise Removal
Saturation”, but we are targeting at directly finding the best threshold
value instead of noise level. (Recall that
doesn’t hold any
more!). That is, we vary the threshold in a certain range and calculate the
energy of the removed noise, the transition point between the linear region and
the saturation region is determined to be the optimal threshold value.
As an important note, even for pure Gaussian noise case the blind threshold searching method can have practical significance since as we pointed out in Section 2.2, the optimal threshold value is derived under minimax criterion, implying the possibility of enhancing the denoising performance by adjusting the threshold for any particular image and wavelet form used. The power of this technique is demonstrated in our example of denoising images with speckle noise. The denoising process is depicted in Fig. 5, where we first take the logarithm of noisy image, after the threshold searching and RDWT denoising, we reverse the log operation to generate the output image.

Figure 5: Block diagram for
denoising speckle noise
Fig. 6 shows the denoised output where n is uniformly distributed with variance of 0.04 in the noisy image. The mse is measured to be 70.7.

(a) Image corrupted with
speckle noise (b) Denoised image
Figure 6: Denoised image for speckle noise
3. Local Information-Based Noise Reduction
3.1
Local contrast entropy: the definition, hypothesis, and the algorithm
For images contaminated with discrete noises, in which case noisy pixels are randomly distributed and are mostly disjoint from one another, automatic identification of noisy pixels is very important. To tell whether an image pixel is noisy or not, a straightforward thought is to compare its gradient gray level with that of adjacent pixels. If the gradient gray level of the pixel being examined is significantly different from that of adjacent pixels, this pixel may be thought as a "noisy" pixel and then gets corrected, it is thought be a valuable pixel and retain its value. However, a major disadvantage of this thought is that it will also erroneously "correct" informative pixels located at or near transitional regions, such as lines and edges, and therefore will blur the image. Therefore, it is a natural idea to take higher-order statistics of the image into consideration. Local contrast probability is defined under this idea.
To define the "local contrast entropy", it
is necessary to first define the "associated contrast" as:
![]()
(2)
where
is the mean gray level
of the surrounding region (for simplicity a square block) centered at the pixel
being considered, and
is the average
gradient level along all directions. Taking the absolute value of the
difference between
and
makes sure that a
deviation from the average local gradient gray level along both positive and
negative directions will both result in a increase of Ci. The
"local contrast probability" is then defined as:
(3)
where n is the window size. Note that different from
the local contrast definition, the window size in here also counts the central
pixel being considered to ensure that the defined probability is limited in the
[0, 1] region. For example, when the block dimension is 3 x 3, n = 3 x 3 = 9
pixels. In the original paper, the authors implicitly assume that the mean
gradient gray level remains constant in adjacent pixels, and therefore the
can be cancelled out
from denominator and numerator. Based on this assumption, it follows that the
sum of local contrast probability for n pixels in the block is a constant 1.
The local contrast entropy in term of its probability is defined as:
(4)
As a result of the information theory [6], this
function is strictly concave with Pi reaches its global optimum when
and only when
for
, i.e., the Pi values for all the pixels in the
window are identical to one another. When Pi deviates much from its
norm
, the local contrast entropy decreases, resulting a loss of
information hypothetically.
The reasoning above is used by authors of [5] to
conclude that the threshold value
is the optimal value
to use in identifying a noisy point. However, there is one mistake made in the
deviation. The local contrast entropy defined in Eq (4) is indeed a conditional
entropy, in which the probability Pi values for the (n-1) pixel
values surrounding the central pixel are pre-set by the image. If all their
values happen to be identical to 1/n, which in reality corresponds to a
homogenous region, then the conclusion Pc = 1/n is correct; otherwise, when the
local contrast probabilities of the surrounding pixels are different from one
another, no matter that the local contrast probability of the central pixel is,
its corresponding entropy can never reach the global maximum, namely
, and therefore the suggested Pc = 1/n does not hold any
more. To confirm this analysis, a histogram is attached below to show the
distribution of local contrast probability Pi values from the
standard 256 x 256 x 256 gray level Einstein image:


Figure
7: Pi histogram of the standard 256 x 256 x 256 gray level Einstein image
before
and after adding
salt&pepper noise with density = 0.02. (Window size used: 3 x 3)
Shown clearly in Figure 7, the originally proposed
threshold value Pc = 1/n = 1/9 does not agree with the Pi histogram from the
original noise-less image. If this threshold is applied, a great percentage of
the informative pixels will be erroneously identified to be "noisy",
which results a considerable loss of information. Therefore, a larger threshold
value Pc should be used to prevent this undesired effect from happening. On the
other hand, however, this threshold value cannot be too large, otherwise, noisy
pixels may be missed out from the identification process. Therefore the best
threshold lies somewhere in the middle, such that most of the noisy pixels can
be identified correctly without overshooting. The experimental results
presented in the following content will verify the existence of such an
optimum.
Although the local contrast probability only takes
up to the 2nd-order local statistics of an image, namely, the local
contrast probability, into consideration, Figure 7 has been able to identify
some characteristics of the additive discrete noise. From the original image, the
Pi curve show a clear cutoff point at Pi = 0.5, above which there are no
pixels. In addition, the middle section of the curve between 0 and the cutoff
0.5 looks ample, indicating a large percentage of pixels locating in this
section. After adding the noise, a great number of pixels shift to two extreme
sides, namely lower and higher end, which results a decrease of local contrast
entropy. This observation is supportive to the entropy hypothesis.
In this work, the evaluation of the local contrast
probability threshold used in noise identification uses an empirical approach.
By assuming no statistics of the discrete noise, the initial value for the
threshold is set to the suggested value 1/n, which guarantees the
identification of most noisy pixels, but overshoots. Then a higher threshold is
applied and the resulting image is judged by human, which is in effect a
noise-processing component in the iteration. If the noise-reduction process
misses a non-negligible portion of noisy pixels, a new threshold value between
the previous two values is applied. This process continues until the point
where the threshold is just high enough to identify most of the noisy pixels,
which results a minimum impact to the image. This algorithm utilizes that fact
that although human eyes are poor at filtering noise components from images,
they are excellent at identifying them. In essence, human brain is a huge
complicated neural network that is capable of remembering a tremendous amount
of information and generalizing it. Identification of a noisy pixel in an image
is a result of taking a great number of surrounding pixels, if not all, into
account. For example, if there is a white pixel in the middle of the dark pupil
section, one can be sure it is a noisy pixel, because no human pupils are
constructed in that way. On the other hand, if this white pixel is located on a
mark table besides a salt container, one would probability think it as a
particle of salt. Therefore, in order to use a machine to automatically
evaluate the best threshold for the noise-identification process, a more
complicated structure is needed, preferably a neural network emulating
image-processing functionalities of the human brain.
Once the threshold is determined, for each pixel, Pi
is compared with its threshold value Pc. If Pi £ Pc, the pixel is regarded as informative and
the algorithm reserves the pixel gray level; but if Pi > Pc,
which indicates significant decrease of local contrast entropy, the pixel is
regarded as noisy, and the algorithm invokes the mean or median filtering
process (mean filter for faster processing speed and median filter for better
transitional region reservation). In fact, any other spatial-domain
noise-reduction filter can be implemented as preferred.
To visualize this algorithm, Figure 8 is shown below
to show this algorithm works:

Several common cases are studied to show the validity
of this algorithm. Some analysis in the original paper is incorrect is the
correct analysis is given below.
-(1) A single spot point region:
A single spot in the center stands out in a
"flat" gray level window, mathematically resulting a local contrast
probability Pi >> Pc > 1/n. The central spot is regarded as
"noisy". The original paper is correct in this part.
-(2) A homogeneous region:
All pixels in the window have nearly identical gray
levels. Pi ≈ 1/n < Pc. The region is regarded as noise-free. The
original paper claims Pi is very low and thus Pi < Pc. This is incorrect.
Note that even though the numerator in the definition of Pi is very small, so
does the denominator. There is a good chance that Pi > Pc. Though it does
not introduce a significant MSE to the resulting image (the pixel value after
noise-reduction will be nearly the same as the original value because of
homogeneity of surrounding pixels), it results a significant decrease of
efficiency, because noise-reduction algorithms are unnecessarily invoked many
times for homogeneous pixels.
-(3) A critical region:
The number of pixels having a gray level greater
than the mean gray level is the same as that of the complementary set. The
central pixel is regarded as noise-free. An error is most likely to happen in
this case. If we set the Pi to a higher threshold value, we may
avoid some of the erroneous correction to informative pixels. This is a weak
point of the algorithm. A solution to this problem is to increase the window
size and take higher-order statistics into account.
-(4) A transition region:
In a window of n pixels, n1 pixels have
nearly the same gray level A1 and n2 pixels have the gray
level A2. The central pixel has a gray level B.
Pi > Pc > 1/n Û B << min(A1, A2) or B >>
max(A1, A2) (5)
Or to say that the pixel is considered noisy if its
gray level is far from those of its neighbors.
-(5) An actual noisy window region (in which all
pixels contaminated by noise)
Pi of the central pixel will stand out of
those of the surrounding pixels if its deviation from the mean is much greater
than that of surrounding pixels.
The flexibility of this local entropy approach lies
on two parameters. One is the block dimension. For most of the experimental
results in this work, 3 x 3 block dimension is used for faster processing. But
other dimensions like 5 x 5 or even 7 x 7 block dimension can be used for
higher order statistic measure and better accuracy. One may design a neural
network and train it with a training set that contains both the original images
and the noise-contaminated images, such that the neural network can
"learn" to weight the importance of adjacent pixels and make a
correct decision. The other parameter is the threshold, which is dependent on
the statistics of the image set. In this work, we use the iterative process
that has already mentioned above.
3.2 Experimental Results
Experiments are conducted on noisy images
contaminated with discrete-noises, in which noisy pixels are mostly disjointed
from one another. In the following content, two kinds of noisy images are
experimented and the results are compared with those of the median filter. The
image to be analyzed is contaminated with the Salt&Pepper noise. The name
explains itself. Pixels in an image are selected randomly and are turned either
on (completely white) or off (completely black). The density of the noise can
be adjusted through a parameter. To demonstrate its performance, four images,
including the original image, the noisy image, median filtered image, and the
local information-based filtered image, are included in Figure 9 for
comparison.

(a) Original Image (b) + Salt&Pepper noise

(c) Median Filter (d)
Local Contrast Entropy Filter
Figure 9. Performance comparison of local information-based filter and median filter.
(Parameters: Salt&Pepper noise density = 0.02; Local contrast entropy threshold
Pc = 1.6·(1/n); median filter window size: 3 x 3)
As shown in Figure 9, the subjective quality of the
resulting image after this local information-based noise-reduction is
considerably better than that of the median filter. This algorithm identifies
and corrected virtually all the noisy pixels, and at the same time maximally
reserves the informative pixels of the original image. Most details of the
original image, including the wrinkles on the forehead, the stripes on the
suit, and the resolution of the hair, are reserved as a result of this
algorithm. Objectively, the algorithm also showed its superior performance over
the median filter in terms of MSE. The comparison is shown below:
MSE
of the original noisy image: 367.12
MSE
of the median filter image: 33.79
MSE
of the local contrast entropy filter image: 29.75
As any other noise-reduction algorithm, this local
entropy-maximizing algorithm has its limitations. Specifically, this algorithm
requires that the density of noisy pixels be not too high. Given that the
measure of local contrast probability depends only on a limited number of
surrounding pixels, if those pixels are also contaminated by noise, their gray
levels will deviate from their original values, and the probabilistic measure
of the local contrast probability becomes less accurate. This inaccuracy may
not only cause erroneous identification of noisy pixels, but also affect the
output of a spatial-domain filter. For example, in a Gaussian-noise
contaminated image generated in Matlab, the gray level of every pixel in the
image deviates from its original value. When calculating the local contrast,
this algorithm effectively adds up all the noises of the surrounding pixels. In
addition, if one uses a median filter for noise processing, the resulting gray
level is simply the average of two median gray levels of surrounding pixels,
both of which are noisy, and therefore the resulting image may still contain a
significant noise component. This is a common problem among many non-linear
spatial-domain noise-reduction filters. For Gaussian noise, it is better to go
to the frequency domain and use the optimum linear filter. For other non-linear
high-density noises, neural network filter is a good choice in that it can
automatically change its weights to best match higher-order statistics of noisy
images through training.
4.
Conclusion
In this project, we investigated two non-linear noise reduction techniques, namely wavelet-transform method and local information-based method. For the wavelet denoising approach, we studied the theoretical framework proposed in the current literatures. The idea was implemented and good performance was obtained. Furthermore, we proposed a novel noise level estimation algorithm and demonstrated its excellent accuracy. The problem of non-Gaussian noise was also tackled and a blind threshold searching technique is developed with its application to speckle-noise corrupted images. For another kind of noise, namely the discrete noise in which noisy pixels are mostly disjoint from one another, local contrast probability is defined and the local information-based noise reduction algorithm is constructed based on maximizing local information hypothesis. This hypothesis is verified by experimental results. The corresponding theoretical analysis in evaluating the optimal threshold value is also re-examined. Some inaccurate derivation in determining local contrast entropy threshold value in the original paper is investigated and adjusted. Compared with the well-known median filter, the local information-based noise reduction technique is proven to have a superior performance. The pros and cons of this technique are also discussed. The beauty of the algorithm lies on its simplicity in implementation and its power in identifying noisy pixels from informative pixels. In the future, more experiments may be conducted with varying window size and local contrast entropy threshold. The dependence of these two parameters on the image set statistics may also be of particular interest to study.
Appendix
In this project, Junjie Wang is responsible for the wavelet denoising technique, and Haixin Tie is responsible for the local information-based technique.
[1] D. Donoho, “De-noising
by soft-thresholding”, IEEE Trans. Inform. Theory, vol. 41, no. 3, pp.
613-627, 1995.
[2] D. Donoho and I. Johnstone,
“Ideal spatial adaptation by wavelet shrinkage”, Technical Report, Department
of Statistics, Stanford University, 1992.
[3] M. Lang et al,
“Noise reduction using an undecimated discrete wavelet transform”, IEEE
Signal Processing Letters, vol. 3, pp. 10-12, 1996.
[4] C. Burrus, R. Gopinath,
H. Guo, Introduction to wavelet and wavelet transforms, Prentice Hall,
1997.
[5] A. Belhdadi and A.
Khellaf, "A noise-filtering method using a local information
measure," IEEE Trans. Image Processing, vol. 6, no. 6, pp. 879-882,
June 1997
[6] T. M. Cover and J. A.
Thomas, Elements of information theory. pp. 29. New York: John Wiley
& Sons, Inc., 1991