Noise
Reduction in the Wavelet Domain
EE368 Digital
Image Processing Class Project
Jungwon Lee
Abstract:
We implemented four different algorithms of noise
removal in the wavelet domain. A linear estimator assumes that the coefficients of the subbands of clean
images follow the Gaussian distribution, and a threshold estimator is based on
the fact that the pdf
of the coefficients at x = 0 is significantly large. A Bayes marginal estimator
and a Bayes joint estimator were suggested by Simoncelli and work fairly well
compared to the linear estimator and the threshold estimator. The Bayes marginal
estimator uses a generalized Laplacian model for the coefficients of subbands,
and the Bayes joint estimator utilizes the dependence between a coefficient and
the neighborhood of the coefficient. We compared the performance of these four
estimators one another and with Wiener filtering, which is a classical solution
of a noise removal problem.
I. Introduction
Classically, Fourier and related representations have been used widely in image processing applications. The noise removal has been done using the Wiener filter, which is derived by assuming a signal model of uncorrelated Gaussian-distributed coefficients in the Fourier domain and utilizes second-order statistics of the Fourier coefficients.
The statistics of many natural images are simplified when they are decomposed via wavelet transform. Recently, many researchers have found that statistics of order greater than two can be utilized in choosing a basis for images. Field [1] has shown that the coefficients of frequency subbands of natural scenes have much higher kurtosis than a Gaussian distribution. Based on this observation Simoncelli [2] suggested an algorithm for noise removal using a non-Gaussian marginal model. Simoncelli [3] also developed a joint non-Gaussian Markov model utilizing the dependence between the coefficients of subbands.
In this work, we have implemented denoising algorithms in the wavelet domain using a linear estimator, a threshold estimator and the estimators based on the two statistical models for the coefficients in fixed multi-scale bases suggested by Simoncelli and compare their performance one another and with Wiener filtering.
II. Estimators
Consider an image contaminated by an additive white Gaussian noise. Since the wavelet transform is orthonormal, the noise is also Gaussian and white in the wavelet domain. The coefficient of the noisy image in the wavelet domain is expressed as y = c + n, where c is the coefficient of the clean image and n is the coefficient of the noise which follows the Gaussian distribution.
1. Linear Estimator
If we assume that the coefficients of the subbands of clean images follow the Gaussian distribution, we can think of the following linear estimator for clean coefficients.
![]()
, where
.
2. Threshold Estimator
Field [1] has shown that kurtosis of subbands of natural scenes varies with filter bandwidth and is maximal at roughly one octave. We can see from figure 1 that wavelet subband coefficients have highly non-Gaussian statistics. Thus, the linear estimator may not be a good estimator for coefficients in the wavelet domain.
Since the pdf of coefficients at c = 0 is significantly large, we can assume that small amplitude values of the coefficients arose from a value of x = 0. Thus, we can think of a following threshold estimator.
3. Bayesian Marginal (Coring) Estimator
Although the threshold estimator is based on the property that the pdf of coefficients at c=0 is significantly large, it does not exploits the whole pdf. The Bayesian marginal estimator uses the whole pdf for estimation, using a pdf model for clean coefficients. Mallat [4] proposed a two-parameter generalized Laplacian distribution for the densities of subband coefficients of separable wavelet decompositions:
.
We fitted the histogram of the coefficients of subbands to this model by minimizing the relative entropy (Kullback-Leibler divergence). The dashed green line in figure 1 is the best fit of the histogram. The generalized Laplacian distribution model fits the histogram so well that it produces very small relative entropy compared to the total entropy.

Figure 1. Histograms of coefficients for a single wavelet subband of an "Einstein" image
Above the histogram is the model parameters p and s and the relative
entropy
of the model and histogram as a fraction of the total entropy of the histogram.
Then, the optimal estimate of c in the least mean square sense using this model is the conditional mean:
Figure 2 shows a Bayesian marginal estimator for this model with p = 0.58. As you can see from figure 2, large amplitude coefficients are preserved, and small amplitude coefficients are suppressed.

Figure 2. Bayesian least-squares estimator with p=0.58.
The dashed green line is a line with slope 1 for comparison.
Assuming that we know the variance of the additive white Gaussian noise,
we can find the parameters p and s from the noisy image. Using the
second and fourth moments of a generalized Laplacian signal corrupted by
additive white Gaussian noise, the parameters s and p are obtained from the
following two equations:
4. Bayesian Joint Estimator
Figure 3 shows the magnitudes of coefficients in a separable wavelet decomposition of the “Einstein” image. You can see from figure 3 that wavelet coefficients are not statistically independent. Large-magnitude coefficients tend to occur close to one another within subbands and also occur at the same relative spatial locations of subbands at adjacent scales and orientations [5, 6].

Figure 3. Coefficient magnitudes in a three-level separable wavelet decomposition of the "Einstein" image
The Bayesian marginal estimator uses for estimation the dependence between coefficients that the Bayes marginal estimator ignores. Figure 4A shows conditional histograms for a child coefficient conditioned on a parent coefficient. Note that a child coefficient and a parent coefficient are not independent. Figure 4B shows the conditional histogram H( log2(c2) | log2(p2 ). The right side of the histogram shows that the conditional expectation E(c2|p2) is approximately proportional to p2. Since Var(c2|p2)=E(c2|p2), this suggests that the variance of child coefficients conditioned on a parent coefficient are linearly dependent on the squared magnitude of the parent coefficient.

A

B
C
Figure 4. Conditional histograms for the finest scale vertical coefficients of
256*256 "Einstein" image.
Brightness of a pixel is proportional to the probability. Each column has been
independently rescaled
so that the maximum probability in each column corresponds to the maximum
brightness.
A: Conditioned on the parent coefficient (same location and orientation, coarser
scale),
B: Same as A, but the x axis and y axis are log2(|P|) and log2(|C|),
respectively,
C: Conditioned on a linear combination of neighboring coefficients
Noting that the variance of child coefficients conditioned on a
parent coefficient are linearly dependent on the magnitude of the parent
coefficient, Simoncelli
suggested a Markov model, in which the density of a coefficient, c, is
conditionally Gaussian with variance a linear function of the squared
coefficients in a neighborhood:
,where the neighborhood {pk} consists of coefficients at adjacent spatial locations, other orientations and adjacent scales.
Figure 4C shows conditional histograms H( log2(c2) | log2(L2) ). Here, L2 is a linear combination of the squared coefficients in a neighborhood:
We used a neighborhood consisting of 4 nearest spatial neighbors and a single parent for figure 4C and for experiments, and the coefficients are chosen to minimize the mean square error between the child coefficient and a linear combination as explained later in detail. The histogram of figure 4C is similar to figure 4B, which supports the Markov model suggested by Simoncelli.
To find the Bayesian joint estimator, we used the result of the Bayesian marginal estimator. We estimated the weight parameters {wk} and the constant b by minimizing the mean square error:
Finding {wk} and b minimizing the mean square error is a typical problem of linear algebra and we can find them from the following setup. Let's define vectors x and y and a matrix A as follows:
Then, the mean square error can be represented as
Therefore, the vector x that minimizes the squared error is
We can find the Bayesian joint estimator using the noisy coefficients y, the estimated parameters {wk} and b, and the neighboring coefficients estimated using the Bayesian marginal estimator:
IV. Choice of Basis
For the test of the estimators, we used a separable critically-sampled quadrature mirror filter (QMF) decomposition. This is a linear-phase approximation to an orthonormal wavelet decomposition. The basis functions are symmetric, but QMF does not allow perfect reconstruction. The lowpass filter is l[n] = [0.028074, -0.060945, -0.073387, 0.41473, 0.79739, 0.41473, -0.073387, -0.060945, 0.028074], and the highpass filter is h[n] = (-1)nl[N-n+1]. Figure 5 shows a single-scale system for a critically-sampled QMF decomposition in one dimension. Since QMF decomposition is separable, two-dimensional decomposition can be done by applying the one-dimensional decomposition to an image vertically and then to resulting subbands horizontally.
Figure 5. Single-scale QMF decomposition
system in one dimension.
Putting this whole system into the filled circle makes a multi-scale
decomposition system.
V. Results
Table 1 and Figure 6 shows signal-to-noise ratios (SNRs) for all four algorithms and Wiener filtering at 8 different levels of noise for an Einstein image. We can see that the SNRs of the Bayesian algorithms are significantly greater than the SNRs of the linear estimator and the threshold estimator. Bayesian algorithms outperforms Wiener filtering at all levels of noise and especially at higher levels of noise, and the Bayesian joint estimator outperforms the Bayesian marginal estimator. When the noise level is low, the linear estimator and the threshold estimator are inferior to Wiener filtering.
|
Noisy image |
Linear
Estimator |
Threshold
Estimator |
Bayesian
Marginal Estimator |
Bayesian
Joint Estimator |
Wiener
Filter |
|
0.39 |
7.58 |
7.21 |
8.41 |
8.47 |
7.28 |
|
1.62 |
8.08 |
7.94 |
9.17 |
9.36 |
8.20 |
|
3.30 |
8.85 |
9.35 |
10.43 |
10.67 |
9.65 |
|
4.31 |
9.29 |
9.89 |
11.13 |
11.36 |
10.41 |
|
6.34 |
10.31 |
11.17 |
12.47 |
12.77 |
11.94 |
|
8.99 |
11.86 |
12.50 |
14.05 |
14.30 |
13.87 |
|
10.04 |
12.53 |
13.07 |
14.70 |
14.97 |
14.56 |
|
11.79 |
13.76 |
14.00 |
15.77 |
15.89 |
15.74 |
Table 1. Noise removal results for all four algorithms in the wavelet domain and Wiener filtering.

Figure 6. Noise removal results for all four
algorithms in the wavelet domain and Wiener filtering.
The pixel values of an image are in the range [0, 255].
Figure 7 shows the original "Einstein" image, the noisy image with SNR = 3.30dB, and resulted images of noise removal algorithms. The Bayesian results appear both sharper and less noisy than the linear estimator since high-amplitude coefficients are preserved and low-amplitude coefficients are suppressed in the Bayesian algorithms. We can clearly see artifacts in the threshold estimator and the artifacts are least prominent in the Bayesian joint estimator among the four estimators except the Wiener filtered image.

A
B

C
D

E
F

G
Figure 7. Noise removal results using a QMF
A: Original "Einstein" image, B: Noisy image (SNR=3.30dB), C: Linear
estimator (SNR=8.85dB), D: Optimal thresholding (SNR=9.35dB)
E: Bayes - marginal model (SNR=10.43dB), F: Bayes - joint model (SNR=10.67dB),
G: Wiener filtering (SNR=9.65dB)
Figure 8 shows the result using a "Smandril" image. As in the "Einstein" image result, the threshold estimator shows artifacts. The Bayes marginal estimator is the best both visually and by SNR. The Bayesian joint estimator blurs the image and the SNR is not so good as the Bayes marginal estimator. However, the Bayesian joint estimator still gives the better result than the Wiener filter.
A B
C D
E F
G
Figure 8. Noise removal results using a QMF
A: Original "Smandril" image, B: Noisy image (SNR=-1.13dB), C: Linear
estimator (SNR=6.95dB), D: Optimal thresholding (SNR=5.04dB)
E: Bayes - marginal model (SNR=7.06dB), F: Bayes - joint model (SNR=6.81dB),
G: Wiener filtering (SNR=5.04dB)
In case of the linear estimator and threshold estimator, the amount of computation is not heavy and the programs implementing those estimators run very fast. The Bayesian marginal estimator requires a little more computation than the linear estimator, and the Bayesian joint estimator requires relatively heavy computation compared to other estimators, but run time for the Bayesian joint estimator is a few minutes in Matlab. Therefore, we can see that all the estimators can be run in a short time.
VI. Discussion and Future Work
We have implemented four estimators and a Wiener filter, and we found that Bayesian estimators are superior to Wiener filtering both visually and by SNR. Although the Bayesian joint estimator requires more computation than Bayesian marginal estimator, the Bayesian joint estimator outperforms the Bayesian marginal estimator for various levels of noise.
Once we implemented the Bayesian joint estimator using a neighborhood consisting of the 12 nearest spatial neighbors (within the same subband), the 5 nearest cousin coefficients (in other orientation bands at the same scale), the 9 nearest parent coefficients (in the adjacent subband of coarser scale), a single aunt (from each orientation at the adjacent coarser scale), and a single grandparent as suggested in [3]. However, the result was not as good as using a neighborhood consisting of the 4 nearest spatial neighbors and a single parent for 256*256 "Einstein" image with a three-level QMF decomposition. We found that the result is significantly different as we change the neighborhood. Therefore, the linear estimate using a neighborhood consisting of many neighbors needs to be validated although it is quite certain that the coefficient variance is linearly dependent on a squared magnitude of a single neighbor.
It is shown empirically that the results with the over-complete steerable pyramid bases are better than the results with the separable QMF basis in [3]. Since the result of the Bayesian estimators depends on the basis used, choosing an optimal basis adaptively can be an interesting research topic.
VII.
References
[1] D. J. Field, “What is the goal of sensory coding?” Neural
Computation, vol. 6, 1994, pp. 559-601
[2] E. P. Simoncelli and E. H. Anderson, “Noise removal via Bayesian wavelet
coring,” Proceedings of the 3rd IEEE International Conference on Image
Processing, vol. 1, 1996, pp. 379-382
[3] E. P. Simoncelli, “Bayesian Denoising of Visual Images in the Wavelet
Domain”, Bayesian Inference in Wavelet Based Models, eds. P Muller and B
Vidakovic, Chapter 18, Springer-Verlag, 1999, pp 291-308
[4] S. G. Mallat, “A theory for multiresolution signal decomposition: The
wavelet representation,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 11, July 1989, pp. 674-693
[5] Robert W. Buccigrossi and E. P. Simoncelli, “Image Compression via Joint Statistical Characterization in the Wavelet Domain Image Processing,” IEEE Transactions on Image Processing, vol. 8, Dec. 1999, pp. 1688 -1701
[6] E. P. Simoncelli, “Statistical Models for Images: Compression, Restoration and Synthesis,” Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, vol. 1, 1998, pp. 673 -678
[7] D. L. Donoho and I. M. Johnstone, “Threshold selection for wavelet shrinkage of noisy data,” Proceedings of the 16th Annual International Conference of the IEEE, vol. 1, 1994, pp. A24-A25
VIII. Source Code
source_code.pdf