Project Project Topics Presentation
| |
Suggested Topics for the Course Project
Please refer your questions to the people listed in the specific topic. They
may provide you with further references and code. Most papers are available at
the IEEE Xplore web
site.
You may contact someone in the Image, Video and Multimedia Systems Group, mentioned
in the topic lists or presenting his research (see schedule),
before and after submitting your proposal, for advice.
I. WYNER-ZIV CODING
| Area |
Wyner-Ziv
Video Coding
(see more on this area below)
There is a number of applications in
which we would like to code video with a low-complexity encoder, but a
high-complexity decoder is affordable. Distributed source coding, and in
particular Wyner-Ziv coding, are data compression approaches that can make
this possible with efficiency close to that of conventional systems, based
on high-complexity coders.
Wyner-Ziv coding, named after
[1], consists of lossy source coding with decoder side information. More
precisely, the source data X is encoded with a rate constraint, and
decoded with a certain distortion, using some side information Y available
at the decoder only. Although the values Y takes on are not available at
the encoder, the statistical dependence between X and Y is know, and
exploited when designing the entire system. Information-theoretical
studies suggest that the compression efficiency achieved can be similar to
the case in which the side information Y is available at the encoder as
well. In fact, in the lossless case, known as Slepian-Wolf coding, the
rate in both cases is equal to H(X|Y) [2].
Despite the recent theoretical
and experimental effort made to build practical Wyner-Ziv coders [3,4],
there is a fundamental problem still unsolved. Motion compensation lead to
a substantial improvement in video coding, but also increased the
complexity of video encoders. If a low-complexity encoder is desired, we
face the challenge of designing a Wyner-Ziv video coder where the motion
compensation is almost entirely carried out at the decoder. This is
theoretically possible if one thinks of the source data as the current
frame and of the side information as the previously reconstructed frames.
The following project topics are
based on ideas of the IVMS group to solve the problem of practical
Wyner-Ziv motion compensation.
|
| Contact |
David
Rebollo-Monedero (also Anne
Aaron and Shantanu Rane)
|
| Topics |
1.
Wyner-Ziv Video Coding with Increasingly Accurate Motion-Compensated Side
Information
The current frame is divided into
blocks, each of which undergoes some transformation, and a scan order is
defined on groups of the resulting coefficients.
Initially, the previously
reconstructed frames without motion compensation or with motion
compensation based on previous motion vector fields are used to estimate
the first group of transform coefficients. Using this estimate as side
information, the first group of transform coefficients is Wyner-Ziv coded.
At the decoder, this coarse reconstruction is used to obtain more accurate
motion estimation and motion-compensated side information. Successive
groups of transform coefficients are coded with more and more accurate
side information, obtained with some motion estimation technique based on
intermediate reconstructions and previous motion vectors. At the last
stages, a rather fine reconstruction of the current block is available and
advanced motion-compensation techniques such as subpixel resolution or
variable block sizes are possible.
[3] suggests the use of the DCT
as blockwise transform. As for the scan order, the conventional zigzag
scan seems reasonable, provided that high frequencies are more susceptible
to accurate motion compensation that low frequencies. The maximum number
of stages, equal to the number of DCT coefficients, would lead to the best
performance but also the highest decoder complexity. Also, at the first stage, or even at each stage,
some additional information can be sent to help the decoder perform the
motion estimation (see 'hash' in note below).
The objective of the
project is the implementation of a Wyner-Ziv video coder based on this
idea, and compare it to other distributed and nondistributed schemes.
Note:
The formulation above is in fact pretty general. For instance, it
would also include a two-stage coding setting using pixel-domain block
subsampling for motion-compensation followed by DCT. In this case, the
transform is linear, and represented by a matrix composed by a few
canonical basis vectors and also the DCT vectors, which gives an
overcomplete representation of the source data. There are two coefficient
groups only. The role of the first group consists of providing some type
of message digest or ‘hash’ to make motion compensation for the second
group at the decoder possible. More complicated ‘hashes’ can be used,
possibly nonlinear. Subsampling in the pixel, frequency or bit domain are possible. In fact, this is one of the lines of research of the
group, proposed by Prof. Girod.
2.
Motion Compensation for Wyner-Ziv Video Coding
A generalization of Wyner-Ziv coding
is Wyner-Ziv coding of noisy sources, in which the observed data Z to
encode and the target data X to be reconstructed are different random
variables, possibly in different alphabets. In this case, X is the motion
of a block in the current frame, which is not known at the encoder, Z is
the block itself, and Y is the previously reconstructed frame. How much
information about Z do we need to transmit if we are only interested in
recovering X only? At a second coding stage, X shall be used to
motion-compensate Y and recover Z with very small rate, hopefully small
enough to make up for the coding of the motion vector.
The
objective of the project is to study the problem of Wyner-Ziv
coding of the motion vectors only, theoretically or practically, in terms
of rate and distortion. The second problem, namely the second stage of the
Wyner-Ziv coding, is not a primary objective.
|
| References |
[1] A. D. Wyner, J. Ziv, “The
rate-distortion function for source coding with side information at the
decoder,” IEEE Trans. Inform. Theory, vo. IT-22, Jan. 1976.
[2] J. D.
Slepian and J. K. Wolf, “Noiseless coding of correlated information
sources,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 471–480, Jul.
1973.
[3] D. Rebollo-Monedero, A. Aaron and
B. Girod, "Transforms for High-Rate Distributed Source Coding,"
Asilomar Conference on Signals, Systems and Computers, 2003 (invited
paper) [PDF]
[4] B.
Girod, A. Aaron, S. Rane and D. Rebollo-Monedero, "Distributed Video
Coding", in Proc. IEEE, Special Issue on Advances in Video Coding and
Delivery, 2003 (invited paper, submitted) [PDF]
|
| Area |
Wyner-Ziv Video Coding (see more on this area above)
Wyner-Ziv coding – source coding with side information only at the decoder – has been shown to be useful and suitable for certain video applications. In our recent work we applied Wyner-Ziv coding to build an intraframe encoder & interframe decoder system which has a very simple encoder, suitable for low-complexity video applications, such as mobile camera-phones, wireless PC cameras and surveillance cameras. We have also used Wyner-Ziv coding techniques to develop a novel error resiliency scheme for video broadcasting, which outperforms traditional forward error correction schemes and does not require a layered video bitstream for graceful quality degradation. Although this has been an active research area in the last few years, there are still many open problems which need to be solved to make Wyner-Ziv coding more practical for real-world systems.
|
| Contact |
Anne Aaron and Shantanu Rane
|
| Topics |
1. Codes for Wyner-Ziv Coding
Channel codes have been shown to work well for source coding with decoder side information. In our current systems, we use a turbo codec as a near lossless Slepian-Wolf codec. Design and study the compression efficiency of other channel codes, especially Low Density Parity Check (LDPC) codes, and investigate their practicality of use for video systems. One important aspect to study is the rate flexibility of the codes for changing source statistics.
2. Rate Control for Wyner-Ziv Coding
For Wyner-Ziv coding scenarios, the rate is dependent on the statistics between the source and the side information. However, the side information is not available at the encoder. Therefore, determining the rate at the encoder is an important issue. Our current rate control assumes feedback from the decoder to the encoder. Investigate better Wyner-Ziv rate control schemes, especially for our current low-complexity video encoder.
3. Other Applications of Wyner-Ziv Coding
We have shown that Wyner-Ziv coding can be used for low-complexity video encoding, error resiliency schemes for video broadcasting and compression for light field images. Can Wyner-Ziv coding be used for other video applications, such as layered video coding, multiple description coding, etc.?
|
| References |
[1] A. D. Wyner, J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Inform. Theory, vol. IT-22, Jan. 1976.
[2] J. D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 471–480, Jul. 1973.
[3] B. Girod, A. Aaron, S. Rane and D. Rebollo-Monedero, "Distributed Video Coding", in Proc. IEEE, Special Issue on Advances in Video Coding and Delivery, 2003 (invited paper, submitted)
|
| Area |
Wyner-Ziv
Quantization
Design of optimal quantizers with
decoder side information (see area “Wyner-Ziv Video Coding” for an
explanation of Wyner-Ziv coding).
|
| Contact |
David
Rebollo-Monedero
|
| Topics |
1.
High-Rate Wyner-Ziv Quantization with an Unconditional Entropy Constraint
A recent extension of the Lloyd
algorithm exists to design optimal quantizers for distributed source
coding [1]. A high-rate approximation principle has been established for
the case in which the rate equals the conditional entropy of the
quantization index given the side information, i.e., R=H(Q|Y) [2]. We
would like to find a high-rate approximation for the case R=H(Q). This
approximation could shed some light on optimal implementation of Slepian-Wolf
codes.
The objective
of the project is to find a mathematical theorem that characterizes
Wyner-Ziv quantizers as R=H(Q) goes to infinity, along with an
experimental verification for a particular case, for instance, Gaussian.
Matlab code implementing Lloyd algorithm for this particular case
available.
|
| References |
[1] D. Rebollo-Monedero, R. Zhang and
B. Girod, "Design of Optimal Quantizers for Distributed Source
Coding," Data Compression Conference (DCC), 2003 [PDF]
[2] D.
Rebollo-Monedero, A. Aaron and B. Girod, "Transforms for High-Rate
Distributed Source Coding," Asilomar Conference on Signals, Systems
and Computers, 2003 (invited paper) [PDF]
|
II. NETWORK STREAMING
| Area |
Congestion-distortion
optimized scheduling of video over a bottleneck link
In a multimedia streaming system, schedulers are
responsible for determining the transmission times of different packets of
a sequence. When the channel conditions are adverse, e.g. varying delay or
random losses, each packet cannot always be reliably delivered to the
receiver by its playout deadline. To address this issue, smart schedulers
which seek to maximize the reconstructed media quality have been proposed
recently [1, 2]. In a bandwidth-limited environment, the impact of the
sender also needs to be taken into account as the media stream itself
might overwhelm a bottleneck link and create large amounts of congestion.
This has motivated us to develop a scheduler which strives to achieve the
highest reconstructed media quality for a given level of congestion [3].
|
| Contact |
Eric Setton
|
| Topics |
1. Comparison of layered and multiple description coding
using the CoDiO scheduler
Several comparisons of layered coding and multiple
description coding using heuristic or rate-distortion optimized schedulers
have been conducted [4]. The aim of the project would be to pursue a new
version of this comparison for a bandwidth-limited environment and with a
new scheduler. Students working on this project would learn how to use the
H.264 encoder/decoder to generate compressed sequences with different
encoding structures and would simulate the video streaming/scheduling in
the network simulator ns-2 (code and help will be provided !). Motivated
students may also want to consider other kind of source coding techniques
in their comparisons.
2. Delivering video to a mobile wireless client
Work on CoDiO has focused so far on a fixed bottleneck
bandwidth. The idea of this project is to extend this to a mobile wireless
client. In this case, the capacity of the last hop will vary making the
channel estimate inaccurate. Students working on this project will analyze
how feedback may be used to estimate the channel variation and how the
frequency of the estimation impacts the performance of video streaming
with the CoDiO scheduler. Students will simulate video
streaming/scheduling in the network simulator ns-2 (code and help will be
provided !). Motivated students may also want to consider the impact of
time-varying cross traffic over the bottleneck link.
|
| References |
[1] P. A. Chou and Z. Miao,
"Rate-distortion optimized streaming of packetized media," IEEE
Transactions on Multimedia, February 2001. Submitted. Can be found at : http://research.microsoft.com/~pachou
[2] M. Kalman, P. Ramanathan, and B. Girod,
"Rate-Distortion Optimized Streaming with Multiple Deadlines,"
Proc. IEEE International Conference on Image Processing, ICIP-2003,
Barcelona, Spain, Sept. 2003. Can be found at : http://www.stanford.edu/~bgirod/publications.html
[3] E. Setton and B. Girod, "Congestion-distortion
optimized scheduling of video over a bottleneck link," submitted to
MMSP 2004. Can be found at : http://www.stanford.edu/~esetton/publications.htm
[4] J. Chakareski, S. Han, and B. Girod, "Layered
Coding vs. Multiple Descriptions for Video Streaming Over Multiple
Paths," Proc. ACM Multimedia 2003, Berkeley, CA, Nov. 2003. Can be
found at : http://www.stanford.edu/~bgirod/publications.html
|
Area
|
Video Streaming Over Ad Hoc Wireless Network
In an ad hoc wireless network, resources are limited and channel qualities fluctuate over time. This poses intriguing challenges for video streaming, an application with stringent delay constraints and high bitrate requirement. Design of the video source coding, rate allocation over the channels, as well as the routing algorithm, need to be tailored for the characteristics of the network, or optimized jointly.
|
| Contact |
Xiaoqing Zhu
|
| Topics |
1. Rate adaptation over wireless ad hoc network
One way to mitigate the effect of channel fluctuation on received video quality is to adapt the source rate accordingly and avoid unnecessary packet losses. Rate adaptation techniques include straight-forward transcoding [1] , skipping less important frames for prediction-based video coders (e.g., MPEG-2 B frames), dropping enhancement information in a stream with layered representation(e.g. in H.263+), switching between multiple representations of a video stream (e.g. SP frames in H.26L), or simply truncating bitstreams if the video stream has an embedded representation (e.g., FGS coding in MPEG-4/H.26L or 3-D wavelet coders[2]).
In this project, one can choose one or more of the rate adaptation techniques and see how they perform over a time-varying wireless ad hoc network. Experiments can be carried out with simple Markovian channel models or over a simulated network using ns-2 [3].
Starter codes and help are provided.
2. Performance of ad hoc routing protocols for video streaming
Routing protocols for ad hoc wireless network has been a popular topic for current research [4]. Existing algorithms include Dynamic Source Routing (DSR) [5], Ad hoc On-Demand Distance Vector(AODV)[6] , etc. The design objectives of these algorithms generally involves minimizing the number of hops in a path, and may not necessarily work well with video streaming. The goal of the project is to evaluate the performance of the routing protocols for video streaming over an ad hoc network. Experiments will be performed over a simulated network using ns-2 [3]. (The mobile nodes and routing algorithms are provided by in the network simulator.) One may also propose additional improvements for the video streaming scheme. Starter codes and help are offered.
|
| References |
[1] A. Vetro, C. Christopoulos and H. Sun, "Video transcoding architectures and techniques: an overview", IEEE Signal Processing Magazine, March 2003 pp.18 - 29
[PDF]
[2] X. Zhu, S. Han and B. Girod, "Congestion-optimized multipath streaming of video over ad hoc wireless network", IEEE International Conference on Image Processing, ICIP-2004, Singapore, October,
2004 [PDF]
[3] NS-2: The Network Simulator http://www.isi.edu/nsnam/ns
[4] E. M. Royer and C-K Toh, "A review of current routing protocols for ad hoc mobile wireless network", IEEE Personal Communication, April 1999,
pp.46-55 [PDF]
[5] D. B. Johnson and D. A. Maltz, "Dynamic source routing in ad hoc wireless networks", Mobile Computing,
1996 [PDF]
[6] M. K. Marina and S. R. Das, "On demand multipath distance vector routing in ad hoc network", IEEE International Conference on Network Protocols, Riverside, CA, USA, Novermber 2001,
pp.14-23 [PDF]
|
Area |
SP-frames for video coding
|
| Contact |
Prashant Ramanathan
|
| Topics |
1. Investigation of SP-frames for video coding
SP-frames have been incorporated in the H.264 standard to allow for switching between bitstreams that have different quality levels, as well as enable error resiliency, and random access without any prediction mismatch [1]. In this project, you will implement SP-frames, and investigate and optimize coder parameters and see how they affect overall compression efficiency under various conditions. Possible improvements include better entropy coding, combining loop filter with the transform for prediction error, and selecting the best quantization parameters for the primary and the switching bitstreams.
|
| References |
[1] M. Karczewicz, R. Kurceren, "The SP- and SI-Frames Design for H.264/AVC," IEEE Trans. CSVT, July 2003.
|
III. LIGHT FIELDS
| Area |
Light Field Streaming & Compression
|
| Contact |
Prashant Ramanathan
Contact Prashant Ramanathan at pramanat@stanford.edu for more details, and
pointers to relevant papers.
|
| Topics |
1. User Interaction Modelling for Light Field Streaming
Light fields are image-based rendering data sets that allow users to experience an photo-realistic environment or interact with a photo-realistic object. Recently, there has been preliminary work on rate-distortion optimized streaming of these large light field data sets over a lossy packet network. Currently, this work assumes perfect knowledge of the users' view trajectory. In this project, you would investigate and model the way a user interacts with a light field, and predict the future actions of the user, for use by the streaming system.
2. Geometry/Image Trade-off in Light Field Compression
Light fields are image-based rendering data sets that tend to be very
large. Accurate geometry is important for the efficient compression of
these data sets, but it also requires more bits to encode the geometry.
In this project, you will experimentally (and possibly theoretically)
investigate the optimal trade-off in the bits spent on geometry and image
information. You will experiment with and possible implement various
geometry coding alogrithms, and examine how this affects the trade-off.
3. Good Heuristic Schemes for Light Field Streaming Packet Scheduling
Rate-distortion optimized (RaDiO) packet scheduling has been applied to the problem of interactive light field streaming [1]. Effective rate control and computational complexity are challenges that the RaDiO approach faces in order to be useful in a practical setting. Recent work on streaming scalable light field data sets [2] takes a different approach, and are less computationally complex and more amenable to rate control. This project explores the design and implementation of a "heuristic" algorithm for packet scheduling that can come close to RaDiO in terms of R-D performance, but is more practical.
|
| References |
[1] P. Ramanathan, M. Kalman, B. Girod, "Rate-Distortion Optimized Streaming of Compressed Light Fields", Proc. ICIP 2003, Barcelona, Spain, Sep 2003.
[2] C.-L. Chang, B. Girod, "Rate-Distortion Optimized Interactive Streaming for Scalable Bitstreams of Light Fields", VCIP-2004.
Other: Contact Prashant.
|
IV. WAVELET VIDEO CODING
Area |
3-D wavelet video coding with scalable motion vectors
Over the years, many researchers have proposed 3-D
wavelet coding of video sequences. Thanks to the multi-resolution nature
of wavelet transforms as well as efficient embedded coding of the wavelet
coefficients, 3-D subband video coding provides great support for
scalability, a very desirable feature when transmitting video over the
network. However, linear transforms applied in the temporal direction may
be inefficient if the motion between frames is not fully exploited.
Many attempts have been made to incorporate motion
compensation into the 3-D wavelet video coding framework Earlier works are
somewhat unsatisfactory in terms of the rate-distortion coding performance
because the motion vector field is severely restricted and the temporal
transform is usually limited to the two-tap Haar wavelet. Recently,
motion-compensated lifting has been proposed, which successfully
incorporates unrestricted motion compensation into 3-D wavelet coding and
provides compression efficiency approaching the state-of-the-art
predictive video coding schemes [1][2][3].
|
| Contact |
Chuo-Ling Chang
and Sangeun Han
|
| Topics |
1. 3-D Wavelet Video Coding with Scalable Motion
Coding and Optimized Bit Allocation
In video coding, motion vectors are usually encoded
losslessly as side information. The number of bits available for coding
the motion vectors directly affects the efficiency of motion compensation,
hence significantly influences compression performance. In non-scalable
coders, various techniques have been used to optimize the portion of
bit-rate spent on motion vectors for a target total bit-rate. However, in
scalable video coding, such as 3-D wavelet video coding, the target total
bit-rate is unknown during encoding.
For 3-D wavelet video coding, a scalable motion coding
scheme using techniques similar to JPEG 2000 has been proposed in
conjunction of scalable coding of the subbands (frame texture) [4]. In
addition, it is shown that the distortion in the reconstructed video
frames can be approximated as a linear function of the distortion in the
reconstructed motion vectors. For such a scheme, the total bit-rate is the
sum of the motion bit-rate and the subband bit-rate, and the allocation
between the two plays an important role in the resulting compression
performance. In [4], the best trade-off between the motion bit-rate and
the subband bit-rate is determined by an exaustive search from
combinations of certain bit-rates. The interaction between the motion
bit-rate and the subband bit-rate, and the resulting distortion in the
reconstructed video frames are yet to be established and verified in order
to achieve a joint bit allocation framework.
|
| References |
[1] A. Secker and D.
Taubman, "Motion-compensated highly scalable video compression using
an adaptive 3D wavelet transform based on lifting,'', in Proc. IEEE Int.
Conf. on Image Processing 2001, Thessaloniki, Greece, Oct. 2001, vol. 2,
pp. 1029-1032.
[2] B. Pesquet-Popescu and V. Bottreau, "Three
dimensional lifting schemes for motion compensated video compression,'' in
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing 2001,
Salt Lake City, UT, USA, May 2001, vol. 3, pp. 1793-1796.
[3] L. Luo, J. Li, S. Li, et al., "Motion
compensated lifting wavelet and its application in video coding,'' in
Proc. IEEE Int. Conf. on Multimedia and Expo 2001, Tokyo, Japan, Aug.
2001, pp. 481-484.
[4] D. Taubman and A. Secker, "Highly scalable
video compression with scalable motion coding," in Proc. IEEE Int.
Conf. on Image Processing 2003, Barcelona, Spain, Sep. 2003, vol. 3, pp.
273-276.
|
|