EE368A Project Report
May. 31, 2001
Sangoh Jeong <sojeong@stanford.edu>,
Kihyun Hong <khong@stanford.edu>
Abstract
Even though many of the existing DCT domain watermarking techniques are not very new techniques, they are still important due to the widespread use of the DCT in many image and video compression standards. In this project, several DCT based watermarking algorithms are investigated in detail. The DCT domain watermarking schemes proved to be very good to many attacks except for the geometric attack, especially such as rotation . . Thus, these problems were tackled in our project to be overcome by using dual detection, which uses a equivalent form of the spatial version of the watermark embedded in the DCT domain as well as the DCT domain detection technique. In order to test the robustness of the algorithms, simple attacks such as filtering, JPEG compression, rotation, resizing and cropping will be exploited.
Table of Contents
1. Introduction
2. Basic Watermarking technique in the spatial
domain
3. Watermarking in the DCT domain
4. Template matching
5. Possible attacks on watermarks
6. Proposed Method
7. Experiments and Results
8. Conclusion and Future Work
9. References
10. Apendix
1. Introduction
In the past decade there has been an explosion in the use and distribution of digital multimedia data. And the copyright for digital multimedia data has become important, which brought about
two complementary techniques: encryption and watermarking. Encryption
techniques can be used to protect digital data during the transmission from
the sender to the receiver. After the receiver has received and decrypted
the data, however, the data is identical to the original data and no longer
protected. Watermarking techniques can complement encryption by embedding a
secret imperceptible signal, a watermark, directly into the original data in
such a way that it always remains present. Such a watermark, for instance
can be used for the following purposes: copyright protection,
fingerprinting, copy protection, broadcast monitoring, data authentication,
indexing and data hiding [1].
Each watermarking application has
its own specific requirements. Therefore, there is no set of requirements to
be met by all watermarking techniques. Nevertheless, some general directions
as in [1][2] can be given for most applications:
Image watermarking techniques proposed so far can be divided into two main groups: those which embed the watermark in the spatial domain such as in [3][4] and those operating in a transform domain, for example, in the DCT domain such as in [2][5][6]. Techniques can also be distinguished according to the way the watermark is extracted from the possibly distorted version of the marked image. The technique that uses the original image in order to extract the possible watermark is called 'coherent' watermarking technique and the other scheme that does not use the original image in extracting the possible watermark is called 'blind' watermarking technique.
Since the existing spatial watermarking technique such as [3][4] cannot use the frequency characteristic of a image, it is vulnerable to the JPEG compression, the most important attack. And the existing DCT domain technique such as [2][5][6] is weak to the geometric attack, especially such as rotation since it uses only the DCT domain detection in extracting the watermark. These problems were tackled in our project to be overcome by using dual detection, which uses a equivalent form of the spatial version of the watermark embedded in the DCT domain to detect whether the possibly watermarked image really has the watermark. Normally, the normal DCT domain detection method is used first to detect the watermark. But, geometric attacks, especially the rotation make the detection fail in the DCT domain by disturbing the synchronization of the DCT coefficients. Then, the equivalent spatial detection is used first by finding the lost synchronization due to the rotation of the watermarked image. The well-known template matching (area correlation)[8] is used to resynchronize geometrically attacked watermarked image. After the geometrically attacked image is resynchronized the usual detection method in the spatial domain is followed using coherent detection. The possible watermark is extracted from the attacked image and it is compared to find whether the attacked image is the originally watermarked image. And this scheme, as a whole, increases the robustness of the watermark in the detection sense.
In the experiment, we first tested several DCT watermarking schemes to find the robustness of those schemes to several attacks in the DCT domain. Two different embedding schemes were considered: Image-independent DCT domain embedding and Image-dependent DCT domain embedding[2][5]. In each case, 8x8 DCT block-based embedding scheme[6] as well as the whole image-based DCT embedding scheme[2][5] were tested for the comparison of robustness. In the detection part, the coherent detection was used for the image-independent embedding scheme and the blind detection was used for image-dependent embedding scheme. After testing for all the DCT domain watermarking schemes by several well-known attacks such as the JPEG compression, median filtering, low pass filtering, cropping, resizing and rotation, we found that all of them were especially vulnerable to geometric rotation. In that case, template matching previously mentioned were extended to the block-based search for synchronization similar to [3].
2. Basic watermarking technique in the spatial domain
2.1 Embedding of a watermark
The
most straightforward way to add a watermark to an image in the spatial
domain is to add a pseudorandom noise pattern to the luminance values of its
pixels. In general, the pseudorandom noise pattern consists of the integers
{-1,0,1},
however, also floating-point numbers can also be used. The pattern is
generated based on a key using, for instance, seeds, linear shift registers
or randomly shuffled binary images. The only constraints are that the energy
in the pattern is more or less uniformly distributed and that the pattern is
not correlated with the host image content. To create the watermarked image
,
as illustrated in the following figure 1 .

Figure
1. Basic watermark embedding procedure
2.2 Detection of the watermark
To detect a watermark in a
possibly watermarked image
we calculate the correlation between the image
and
the pseudorandom noise pattern
.
In general,
is
normalized to a zero mean before correlation. Pseudorandom patterns
generated using different keys have very low correlation with each other.
Therefore, during the detection process the correlation value will be very
high for a
pseudorandom pattern generated with the correct key and would be very low
otherwise. During the detection process, it is common to set a threshold T
to decide whether the watermark is detected or not. If the correlation
exceeds a certain threshold T, the watermark detector determines that image
contains
watermark
.
This is well explained in the following figure 2.

Figure 2 : The detection process for the embedding procedure described in 2.1
During the detection process, the watermark detector can make two types of errors. In the first place, it can detect the existence of a watermark, although there is none. This is called a false positive. In the second place, the detector can reject the existence of the watermark, even though there is one. This is called a false negative. Details are expounded in [Langelaar, SPM].
3. Watermarking in the DCT domain
3.1 Embedding of a watermark
3.1.1 Image independent
embedding method
This
method is the simplest embedding method in the DCT domain, which casts
watermark data into the selected region in the DCT domain, i.e. middle band
or low band. It has some disadvantage. It is hard to detect watermarks
without original images, and perceptibility depends on DCT coefficient
sensitivity, i.e.difference in scale. Even though this method has some
deficient, it can be used to make watermarks robust in the respect that gain
factor (k) can be increased, if watermark data are embedded in the low
sensitivity region such as low band in the DCT domain.
In
this step, the DCT coefficients of the original image are reordered into
zig-zag scan, such as in the JPEG compression. Then, the coefficients from
the (L+1)th to the (L+M)th are taken according to the
zig-zag ordering of the DCT spectrum. Embedding a watermark into the chosen
DCT coefficients is
![]()
,where
is
watermarked DCT coefficients,
is chosen DCT coefficients,
is watermark data, k is gain, and i = 1, 2, ..., M.

Figure 3. Image independent watermark embedding
Finally,
is reinserted in the zig-zag scan. Then the inverse DCT of the watermarked
coefficients gives the watermarked image.
3.1.1 Image dependent
embedding method
This
method also uses selected region like image independent embedding method.
However, insertion watermark data based on this method is more robust
against DCT coefficient deference in scale.[2]
![]()
In first case, embedding a watermark into the chosen DCT coefficients is similar to the image independent embedding method. Main difference is that the watermark is scaled by the DCT coefficients when the watermark is inserted. Using this method is also difficult to detect watermarks without original images.
The second method, which proposed by [5] is for blind detection, i.e. detecting watermarks without original images. In this case, the watermark will be cast in the middle band to achieve the perceptual invisibility of the watermark. In this step, embedding a watermark into the chosen DCT coefficients as a watermark scaled by the DCT coefficient absolute value.
![]()

Figure 4. Image dependent watermark embedding
The human eye is more sensitive to noise in lower frequency components than in higher frequency ones. However, the energy of most natural images are concentrated in the lower frequency range, and watermark data in the higher frequency components might be discarded after quantization operation of lossy compression. In order to invisibly embed the watermark that can survive lossy data compressions, a reasonable trade-off is to embed the watermark into the middle-frequency range of the image.[6] In [5], the middle band of the whole DCT domain is chosen. But, with regard to JPEG, casting watermarks in the middle band of the 8 by 8 block-based DCT domain is more robust.

Figure 5. Definition of middle band frequencies in a DCT block
In
the case of embedding a watermark based on 8 by 8 block DCT, only
coefficients for each 8 by 8 image block will be used for the watermark
embedding, where
is the image size and M is the watermark size.[6] First, the DCT
coefficients of each block are reordered in zig-zag scan. Then, the
coefficients from the (L+1)th to the (L+
)th
are taken according to the zig-zag ordering of the DCT spectrum, where the
first L coefficients are skipped for embedding the middle band.
Embedding a watermark into the chosen DCT coefficients is
![]()
,where n indicates the nth block
The watermarked image is
obtained after
is reinserted in the zig-zag scan and block inverse DCT is performed.
3.2
Detection of the watermark
To
detect a watermark in a possibly watermarked image, we calculate the
correlation between the DCT coefficients of the watermarked image and the
pseudorandom noise pattern (watermark data). As mentioned before, the
correlation value will be very high for the embedded watermark and would be
very low otherwise. During the detection process, threshold T is set for
detection.[1]
3.2.1 Coherent detection
The
coherent detection is the method to detect a watermark with the original
image. First, a possible watermark is extracted from a watermarked image.
For the image independent embedding method, the DCT coefficients of a
possibly corrupted image and the original are reordered in zig-zag scan, and
the coefficients from the (L+1)th to the (L+M)th are
selected to extract the watermark. And by subtracting the the DCT
coefficients of the original image, a possible watermark is obtained. Then,
we calculate the correlation or the correlation coefficient between the
extracted watermark and the embedded watermark[5].
Figure 6. Coherent detection
¡¡
![]()
where
is distorted watermarked DCT coefficients, and
is
corrupted watermark data.
In
this case, we can simply analysis the method if the extracted watermark is
not corrupted.
Assuming
are independent and have zero mean, where W is a possibly different
watermark, the correlation is
![]()
when
,
the correlation; z = 0. However, if
,
![]()
In this case, z =
. Also, the correlation coefficient is
, when ![]()
,when ![]()
The correlation and correlation coefficient play a role in decision of watermark existence.
3.2.2 Blind detection
The
method to detect a watermark without the original image is called blind
detection. Now, we consider the method by [5], the second image dependent
embedding method. The DCT coefficients of a possibly corrupted image are
reordered in zig-zag scan, and the coefficients from the (L+1)th to
the (L+M)th are selected to generate
.
Then, we calculate the correlation between potentially corrupted coefficients
and the
embedded watermark. This correlation is a measure of the watermark presence.

Figure 7. Blind detection
To analyze this detection, let's suppose that the watermarked image has not been corrupted. Then, we have
![]()
z
(the correlation) have been computed under the following hypothesis: both
s and
s are
zero mean, independent and equally distributed random variables. z
becomes
![]()
![]()
![]()
when
,
the correlation; z = 0 since
,
and W
are zero mean and independent. However, if
,
![]()
![]()
In this case,
,
where
is |
|s
mean, and
is
s
variance.
The correlation z can be used to determine whether a watermarks is present or not.¡¡
4. Template matching
The presence of a known object in a scene can be detected by searching for the location of match between the object template u(m,n) and the scene v(m,n). Template matching can be conducted by searching the displacement of u(m,n), where the mismatch energy is minimum. For a displacement (p,q), we define the mismatch energy
![]()
For
to achieve a minimum, it is sufficient to maximize
the cross-correlation
From the Cauchy-Schwarz
inequality, we have

Where the
equality occurs if and only if
, where
is an arbitrary constant and can be set equal to 1. This means
the cross-correlation
attains the maximum
value when the displaced position of the template coincides with the
observed image. Then, we obtain
And the
desired maximum occurs when the observed image and the template are
spatially registered. Therefore, a given object u(m,n) can be located
in the scene by searching the peaks of the cross-correlation function (See
figure 8.). Often
the given template and the observed image are not only spatially translated
but are also relatively scaled and rotated.
For example,
where
and
are the scale factors, (p',
q')
are the displacement coordinates, and is the rotation angle of the observed
image with respect to the template. In such space (p', q',
,
,
). This can become quite impractical unless
reasonable unless reasonable estimates of
,
and
are given[Jain].
The cross-correlation
is also called the area correlation. It can be evaluated
either directly or as the inverse Fourier transform [Jain].
Figure 8. Arrangement for obtaining the correlation
5. Possible attacks on watermarks
An attack on a watermark can be defined as an operation, coincidental or hostile, that may degrade a watermark and possibly make it unreliably detectable. In many literatures [Kutter][Hartung], attacks are classified in various ways and one of the most popular classification is to assort them as following three parts.
5.1 JPEG compression
JPEG is currently one of the most widely used compression algorithms for images and any marking system should be resilient to some degree of compression. The degree of compression is controlled by the quality factors, i.e., 90, 80, 70 ... 10. Although images compressed with a very low quality factor do not have much commercial value, some marking systems do survive them. Hence using a broad scale of compression parameters gives more accurate comparison.
5.2 Geometric transformations
5.3 Enhancement techniques
6.1 Overall scheme
The entire flow for out project is shown in the following figure 9.

Figure 9. Flow of the dual detection
In the figure
above, each block is performed in one of the two domain (DCT and spatial)
except for the blocks uncommented. And the uncommented blocks can be
performed in the DCT and in the spatial domain. Basically, the spatial
domain detection can be applied regardless of the embedding techniques in
the DCT domain as long as the geometric rotation is involved. And it
inherently requires the original image in the detection in our scheme. The
DCT domain detection will be dependent on the embedding scheme and the DCT
domain watermark embedding and detection schemes are described in
"3. Watermarking in the DCT domain" in detail. Resynchronization
can achieved using the block-based search explained in the next part
6.2. The potential equivalent watermark can be extracted from the
formulation
in
"3. Watermarking in the DCT domain" directly when the watermark is
embedded according to the image-independent scheme. In case of the
image-dependent embedding scheme, since it's hard to extract the potential
equivalent watermark in the spatial domain, we can regard the difference
between the original image and the attacked image as the potential
equivalent watermark. Then, by assuming it to be a random pattern, we can
use the same correlation-based detection technique in the spatial domain.
6.2 Block-based search for synchronization
When the geometric attack, especially the rotation, is applied to the watermark embedded in the DCT domain, it is hard to detect the watermark in the DCT domain due to the loss of synchronization. Then, as explained in 6.1, the attacked image should be resynchronized so that the potential equivalent watermark could be extracted. Template matching introduced in 4. is extended to be used to find the lost synchronization for geometric attacks in which the rotation is involved. The following figure explains the details.

Figure 10. Block-based search for synchronization
In the figure above,
is the attacked image and
is
the test image made from the original image to find the synchronization. The
test image is generated for several values of rotation angles (degree) and
for several scale factors. Each test image is divided into several (l
) blocks, and each block u(m-p, n-q) is correlated with the
same size of the block within the search window v(m,n) in the
attacked image for every pixel position in the search window. Here, u(m-p,
n-q) and v(m,n) are exactly the same as those explained in
"4. Template matching". Then the correlation between two blocks can be rewritten
as
where, p and q are possible pixel positions within the search window. Maximum correlation found in a specific search window becomes the correlation value for that matching. Since the test image consists of several blocks, we need to add up each correlation value to find the entire correlation sum
Finally, resizing factor
is
taken into consideration due to the resizing attack. When the test image is
resized to the attacked image,
should be divided by
since
two resized images are correlated. Then, the scaled correlation sum is
described as
For several resizing factors and rotation
angles, we can determine the synchronization angle and the resizing factor
that maximize
.
After we find the synchronization, we can resynchronize the geometrically
attacked image. Therefore, we can extract the potential watermark in the
spatial domain. Using the correlation between the potential watermark and
the original watermark, we can determine whether the watermark exists or
not.
7. Experiments and Results
7.1 Generation of a watermark
When the image-independent embedding and the coherent detection is used, uniformly random-generated numbers [-0.5, 0.5] were used as a watermark pattern. In case of the watermarking scheme that uses the image-dependent embedding and the blind detection, the principle underlying our watermark-generating strategy is that the mark be constructed from independent, identically distributed (i.i.d.) samples drawn from a Gaussian distribution[2]. Once the significant components are located, Gaussian noise is injected therein. The watermark consists of a sequence of real numbers and we create a watermark where each value is chosen independently according to N(0, 1) (where N denotes a normal distribution ). And, in our experiment, watermark patterns were embedded in the DCT domain.
7.2 Results in the DCT domain
In order to test the watermarking schemes we used three images (256 by 256), lena, cman, and lotus. And 1000 random watermarks were generated for evaluating DCT domain detection method.

Figure 11. Test images : Lena, Cameraman, Lotus
7.2.1 Image independent embedding in the DCT domain and coherent detection
We generated an
uniformly distributed watermark whose length is 64*64;
~U[-1/2
1/2]. We applied to
to
embed the watermark. In the whole DCT case, we set k=5 and L=2,
and in the block based DCT, k=5 and L=2.