Noise Reduction in the Wavelet Domain

EE368 Digital Image Processing Class Project
Jungwon Lee

Abstract:

 We implemented four different algorithms of noise removal in the wavelet domain. A linear estimator assumes that the coefficients of the subbands of clean images follow the Gaussian distribution, and a threshold estimator is based on the fact that the pdf of the coefficients at x = 0 is significantly large. A Bayes marginal estimator and a Bayes joint estimator were suggested by Simoncelli and work fairly well compared to the linear estimator and the threshold estimator. The Bayes marginal estimator uses a generalized Laplacian model for the coefficients of subbands, and the Bayes joint estimator utilizes the dependence between a coefficient and the neighborhood of the coefficient. We compared the performance of these four estimators one another and with Wiener filtering, which is a classical solution of a noise removal problem.
 

I. Introduction

 Classically, Fourier and related representations have been used widely in image processing applications. The noise removal has been done using the Wiener filter, which is derived by assuming a signal model of uncorrelated Gaussian-distributed coefficients in the Fourier domain and utilizes second-order statistics of the Fourier coefficients.

 The statistics of many natural images are simplified when they are decomposed via wavelet transform. Recently, many researchers have found that statistics of order greater than two can be utilized in choosing a basis for images. Field [1] has shown that the coefficients of frequency subbands of natural scenes have much higher kurtosis than a Gaussian distribution. Based on this observation Simoncelli [2] suggested an algorithm for noise removal using a non-Gaussian marginal model. Simoncelli [3] also developed a joint non-Gaussian Markov model utilizing the dependence between the coefficients of subbands.

 In this work, we have implemented denoising algorithms in the wavelet domain using a linear estimator, a threshold estimator and the estimators based on the two statistical models for the coefficients in fixed multi-scale bases suggested by Simoncelli and compare their performance one another and with Wiener filtering.

 

II. Estimators

 Consider an image contaminated by an additive white Gaussian noise. Since the wavelet transform is orthonormal, the noise is also Gaussian and white in the wavelet domain. The coefficient of the noisy image in the wavelet domain is expressed as y = c + n, where c is the coefficient of the clean image and n is the coefficient of the noise which follows the Gaussian distribution.

1. Linear Estimator

 If we assume that the coefficients of the subbands of clean images follow the Gaussian distribution, we can think of the following linear estimator for clean coefficients.

, where .

2. Threshold Estimator

 Field [1] has shown that kurtosis of subbands of natural scenes varies with filter bandwidth and is maximal at roughly one octave. We can see from figure 1 that wavelet subband coefficients have highly non-Gaussian statistics. Thus, the linear estimator may not be a good estimator for coefficients in the wavelet domain.

 Since the pdf of coefficients at c = 0 is significantly large, we can assume that small amplitude values of the coefficients arose from a value of x = 0. Thus, we can think of a following threshold estimator.


3. Bayesian Marginal (Coring) Estimator

 Although the threshold estimator is based on the property that the pdf of coefficients at c=0 is significantly large, it does not exploits the whole pdf. The Bayesian marginal estimator uses the whole pdf for estimation, using a pdf model for clean coefficients. Mallat [4] proposed a two-parameter generalized Laplacian distribution for the densities of subband coefficients of separable wavelet decompositions:

  , where .

 We fitted the histogram of the coefficients of subbands to this model by minimizing the relative entropy (Kullback-Leibler divergence). The dashed green line in figure 1 is the best fit of the histogram. The generalized Laplacian distribution model fits the histogram so well that it produces very small relative entropy compared to the total entropy.

 


Figure 1.  Histograms of coefficients for a single wavelet subband of an "Einstein" image
Above the histogram is the model parameters p and s and the relative entropy 
of the model and histogram as a fraction of the total entropy of the histogram.

 

 Then, the optimal estimate of c in the least mean square sense using this model is the conditional mean:

 Figure 2 shows a Bayesian marginal estimator for this model with p = 0.58. As you can see from figure 2, large amplitude coefficients are preserved, and small amplitude coefficients are suppressed.

 


Figure 2. Bayesian least-squares estimator with p=0.58.
The dashed green line is a line with slope 1 for comparison.

 


 Assuming that we know the variance of the additive white Gaussian noise, we can find the parameters p and s from the noisy image. 
Using the second and fourth moments of a generalized Laplacian signal corrupted by additive white Gaussian noise, the parameters s and p are obtained from the following two equations:


4. Bayesian Joint Estimator

 Figure 3 shows the magnitudes of coefficients in a separable wavelet decomposition of the “Einstein” image. You can see from figure 3 that wavelet coefficients are not statistically independent. Large-magnitude coefficients tend to occur close to one another within subbands and also occur at the same relative spatial locations of subbands at adjacent scales and orientations [5, 6].

 

Figure 3. Coefficient magnitudes in a three-level separable wavelet decomposition of the "Einstein" image

 

 The Bayesian marginal estimator uses for estimation the dependence between coefficients that the Bayes marginal estimator ignores. Figure 4A shows conditional histograms for a child coefficient conditioned on a parent coefficient. Note that a child coefficient and a parent coefficient are not independent. Figure 4B shows the conditional histogram H( log2(c2) | log2(p2 ). The right side of the histogram shows that the conditional expectation E(c2|p2) is approximately proportional to p2.  Since Var(c2|p2)=E(c2|p2), this suggests that the variance of child coefficients conditioned on a parent coefficient are linearly dependent on the squared magnitude of the parent coefficient.

 


A

 
B                                                                                     C

Figure 4. Conditional histograms for the finest scale vertical coefficients of 256*256 "Einstein" image.
Brightness of a pixel is proportional to the probability. Each column has been independently rescaled
 so that the maximum probability in each column corresponds to the maximum brightness.
A: Conditioned on the parent coefficient (same location and orientation, coarser scale),
B: Same as A, but the x axis and y axis are log2(|P|) and log2(|C|), respectively,
C: Conditioned on a linear combination of neighboring coefficients


 Noting that the variance of child coefficients conditioned on a parent coefficient are linearly dependent on the magnitude of the parent coefficient,
Simoncelli suggested a Markov model, in which the density of a coefficient, c, is conditionally Gaussian with variance a linear function of the squared coefficients in a neighborhood:

,where the neighborhood {pk} consists of coefficients at adjacent spatial locations, other orientations and adjacent scales.

 Figure 4C shows conditional histograms H( log2(c2) | log2(L2) ). Here, L2 is a linear combination of the squared coefficients in a neighborhood:

 We used a neighborhood consisting of 4 nearest spatial neighbors and a single parent for figure 4C and for experiments, and the coefficients are chosen to minimize the mean square error between the child coefficient and a linear combination as explained later in detail. The histogram of figure 4C is similar to figure 4B, which supports the Markov model suggested by Simoncelli.

 To find the Bayesian joint estimator, we used the result of the Bayesian marginal estimator. We estimated the weight parameters {wk} and the constant b by minimizing the mean square error:

 Finding {wk} and b minimizing the mean square error is a typical problem of linear algebra and we can find them from the following setup. Let's define vectors x and y and a matrix A as follows:

Then, the mean square error can be represented as

Therefore, the vector x that minimizes the squared error is

 We can find the Bayesian joint estimator using the noisy coefficients y, the estimated parameters {wk} and b, and the neighboring coefficients estimated using the Bayesian marginal estimator:

 

IV. Choice of Basis

 For the test of the estimators, we used a separable critically-sampled quadrature mirror filter (QMF) decomposition. This is a linear-phase approximation to an orthonormal wavelet decomposition. The basis functions are symmetric, but QMF does not allow perfect reconstruction. The lowpass filter is l[n] = [0.028074, -0.060945, -0.073387, 0.41473, 0.79739, 0.41473, -0.073387, -0.060945, 0.028074], and the highpass filter is h[n] = (-1)nl[N-n+1]. Figure 5 shows a single-scale system for a critically-sampled QMF decomposition in one dimension. Since QMF decomposition is separable, two-dimensional decomposition can be done by applying the one-dimensional decomposition to an image vertically and then to resulting subbands horizontally.

 

Figure 5. Single-scale QMF decomposition system in one dimension. 
Putting this whole system into the filled circle makes a multi-scale decomposition system.

 

 

V. Results

 Table 1 and Figure 6 shows signal-to-noise ratios (SNRs) for all four algorithms and Wiener filtering at 8 different levels of noise for an Einstein image. We can see that the SNRs of the Bayesian algorithms are significantly greater than the SNRs of the linear estimator and the threshold estimator. Bayesian algorithms outperforms Wiener filtering at all levels of noise and especially at higher levels of noise, and the Bayesian joint estimator outperforms the Bayesian marginal estimator. When the noise level is low, the linear estimator and the threshold estimator are inferior to Wiener filtering.

 

Noisy image
SNR (dB)

Linear Estimator
SNR (dB)

Threshold Estimator
SNR (dB)

Bayesian Marginal Estimator
SNR (dB)

Bayesian Joint Estimator
SNR (dB)

Wiener Filter
SNR (dB)

0.39

7.58

7.21

8.41

8.47

7.28

1.62

8.08

7.94

9.17

9.36

8.20

3.30

8.85

9.35

10.43

10.67

9.65

4.31

9.29

9.89

11.13

11.36

10.41

6.34

10.31

11.17

12.47

12.77

11.94

8.99

11.86

12.50

14.05

14.30

13.87

10.04

12.53

13.07

14.70

14.97

14.56

11.79

13.76

14.00

15.77

15.89

15.74

Table 1. Noise removal results for all four algorithms in the wavelet domain and Wiener filtering.

 

Figure 6. Noise removal results for all four algorithms in the wavelet domain and Wiener filtering.
The pixel values of an image are in the range [0, 255].

 

 Figure 7 shows the original "Einstein" image, the noisy image with SNR = 3.30dB, and resulted images of  noise removal algorithms. The Bayesian results appear both sharper and less noisy than the linear estimator since high-amplitude coefficients are preserved and low-amplitude coefficients are suppressed in the Bayesian algorithms. We can clearly see artifacts in the threshold estimator and the artifacts are least prominent in the Bayesian joint estimator among the four estimators except the Wiener filtered image.

 


A                                                                                 B


C                                                                                 D


E                                                                                  F


G

Figure 7. Noise removal results using a QMF
A: Original "Einstein" image, B: Noisy image (SNR=3.30dB), C: Linear estimator (SNR=8.85dB), D: Optimal thresholding (SNR=9.35dB)
E: Bayes - marginal model (SNR=10.43dB), F: Bayes - joint model (SNR=10.67dB), G: Wiener filtering (SNR=9.65dB)

 

 Figure 8 shows the result using a "Smandril" image. As in the "Einstein" image result, the threshold estimator shows artifacts. The Bayes marginal estimator is the best both visually and by SNR. The Bayesian joint estimator blurs the image and the SNR is not so good as the Bayes marginal estimator. However, the Bayesian joint estimator still gives the better result than the Wiener filter.

 

A                                                                                                        B

C                                                                                                        D

E                                                                                                        F

G

Figure 8. Noise removal results using a QMF
A: Original "Smandril" image, B: Noisy image (SNR=-1.13dB), C: Linear estimator (SNR=6.95dB), D: Optimal thresholding (SNR=5.04dB)
E: Bayes - marginal model (SNR=7.06dB), F: Bayes - joint model (SNR=6.81dB), G: Wiener filtering (SNR=5.04dB)

 

 In case of the linear estimator and threshold estimator, the amount of computation is not heavy and the programs implementing those estimators run very fast. The Bayesian marginal estimator requires a little more computation than the linear estimator, and the Bayesian joint estimator requires relatively heavy computation compared to other estimators, but  run time for the Bayesian joint estimator is a few minutes in Matlab. Therefore, we can see that all the estimators can be run in a short time.

 

VI. Discussion and Future Work

 We have implemented four estimators and a Wiener filter, and we found that Bayesian estimators are superior to Wiener filtering both visually and by SNR. Although the Bayesian joint estimator requires more computation than Bayesian marginal estimator, the Bayesian joint estimator outperforms the Bayesian marginal estimator for various levels of noise.

 Once we implemented the Bayesian joint estimator using a neighborhood consisting of the 12 nearest spatial neighbors (within the same subband), the 5 nearest cousin coefficients (in other orientation bands at the same scale), the 9 nearest parent coefficients (in the adjacent subband of coarser scale), a single aunt (from each orientation at the adjacent coarser scale), and a single grandparent as suggested in [3]. However, the result was not as good as using a neighborhood consisting of the 4 nearest spatial neighbors and a single parent for 256*256 "Einstein" image with a three-level QMF decomposition. We found that the result is significantly different as we change the neighborhood. Therefore, the linear estimate using a neighborhood consisting of many neighbors needs to be validated although it is quite certain that the coefficient variance is linearly dependent on a squared magnitude of a single neighbor.

 It is shown empirically that the results with the over-complete steerable pyramid bases are better than the results with the separable QMF basis in [3]. Since the result of the Bayesian estimators depends on the basis used, choosing an optimal basis adaptively can be an interesting research topic.

 

VII. References

[1] D. J. Field, “What is the goal of sensory coding?” Neural Computation, vol. 6, 1994, pp. 559-601

[2] E. P. Simoncelli and E. H. Anderson, “Noise removal via Bayesian wavelet coring,” Proceedings of the 3rd IEEE International Conference on Image Processing, vol. 1, 1996, pp. 379-382

[3] E. P. Simoncelli, “Bayesian Denoising of Visual Images in the Wavelet Domain”, Bayesian Inference in Wavelet Based Models, eds. P Muller and B Vidakovic, Chapter 18, Springer-Verlag, 1999, pp 291-308

[4] S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, July 1989, pp. 674-693

[5] Robert W. Buccigrossi and E. P. Simoncelli, “Image Compression via Joint Statistical Characterization in the Wavelet Domain Image Processing,” IEEE Transactions on Image Processing, vol. 8, Dec. 1999, pp. 1688 -1701

[6] E. P. Simoncelli, “Statistical Models for Images: Compression, Restoration and Synthesis,” Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, vol. 1, 1998, pp. 673 -678

[7] D. L. Donoho and I. M. Johnstone, “Threshold selection for wavelet shrinkage of noisy data,” Proceedings of the 16th Annual International Conference of the IEEE, vol. 1, 1994, pp. A24-A25

 

VIII. Source Code

source_code.pdf