|
Project List
| Authors |
Topic |
Contact |
Proposal |
Presentation |
Report |
| Agarwal, Kim, Gupta |
Performance-Complexity Tradeoff H264 Motion Search |
Eric and Sangeun |
PDF |
PPT |
PDF |
| Chan, Guerrero, Tsang |
Fast Macroblock-Adaptive Frame/Field Coding Selection in H.264 |
Eric and Sangeun |
PDF |
PPT |
PDF |
| Comer, Reinhardt, Yabaluri |
Scalable 3D Wavelet Video Coding |
Chuo-Ling |
PDF |
PPT |
PDF |
| George, Zhang |
Low Complexity Schemes for Long Term Memory Motion Compensation |
Eric and Sangeun |
PDF |
PPT |
PDF |
| Hristodorescu |
Performance-Complexity Tradeoff H264 Motion Search |
Eric and Sangeun |
PDF |
PPT |
PDF |
| Jalali, Maleki |
Directional Lifting-Based Wavelets |
Chuo-Ling |
PDF |
PPT |
PDF |
| Jagannathan, Skraba |
Distributed Compression of Lightfields with Wavelets |
Xiaoqing |
PDF |
PPT |
PDF |
| Lee, Song, Moon |
Hash-Based ME |
Shantanu |
PDF |
PPT |
PDF |
| Lin, Sung, Yeh |
Hash-Based ME (Scalable Feature Extraction for Remote ME) |
David R-M |
PDF |
PPT |
PDF |
| Mavlankar |
Motion-Compensated Lifted Wavelet Transforms |
Chuo-Ling |
PDF |
PPT |
PDF |
| Wu, Pratx, Dehy |
Hash-Based ME |
Shantanu |
PDF |
PPT |
PDF |
| Yoon, Mao |
Rate-Distortion Optimized Video Streaming for SNR Scalable H.264 |
Mark |
PDF |
PDF |
PDF |
| Zymnis, Anbu (partial collaboration of Rajiv) |
Hash-Based ME |
David R-M |
PDF |
PPT |
PDF |
Project Presentation Schedule
| TUESDAY, March 8 (2:45-4, Gates B3) |
| Time |
Group |
Topic |
| 2:45-2:50 |
- |
Introduction/Preparation |
| 2:50-3:05 |
Jalali, Maleki |
Directional Lifting-Based Wavelets |
| 3:05-3:15 |
Mavlankar |
Motion-Compensated Lifted Wavelet Transforms |
| 3:15-3:35 |
Comer, Reinhardt, Yabaluri |
Scalable 3D Wavelet Video Coding |
| 3:35-3:50 |
Jagannathan, Skraba |
Distributed Compression of Lightfields with Wavelets |
| 3:50-4 |
Hristodorescu |
Performance-Complexity Tradeoff H.264 Motion Search |
| THURSDAY, March 10 (2:45-4, Gates B3) |
| Time |
Group |
Topic |
| 2:45-3 |
George, Zhang |
Low Complexity Schemes for Long-Term Memory Motion Compensation |
| 3-3:20 |
Agarwal, Gupta, Kim |
Performance-Complexity Tradeoff H.264 Motion Search |
| 3:20-3:40 |
Chan, Guerrero, Tsang |
Fast Macroblock-Adaptive Frame/Field Coding Selection in H.264 |
| 3:40-4 |
Lee, Moon, Song |
Hash-Based ME |
| FRIDAY, March 11 (2:15-3:30, McCullough 115) |
| Time |
Group |
Topic |
| 2:15-2:35 |
Anbu, Zymnis (Partial collaboration of Rajiv) |
Hash-Based ME |
| 2:35-2:55 |
Dehy, Pratx, Wu |
Hash-Based ME |
| 2:55-3:15 |
Lin, Sung, Yeh |
Hash-Based ME (Scalable Feature Extraction for Remote ME) |
| 3:15-3:30 |
Mao, Yoon |
Rate-Distortion Optimized Video Streaming for SNR Scalable H.264 |
Course Project Topics
Old Project Topics
You may take a look at the old project topics for the second part of the extended version of EE398, EE398B, taught last year. However, some of these may no longer be relevant to the research carried out by who posted them, or the problems mentioned may already have been solved. In addition, these project topics were proposed for a course in which the project represented most of the workload, since there were only two HW assignments. Therefore, it is strongly recommended that you focus on the new project topics below, or else contact the person who posted the topic and also consult with the course TA or the instructor.
New Project Topics
Please address your questions to the people listed in the specific topic, members of the Image, Video and Multimedia Systems Group. You may contact them before and after submitting your proposal, for advice. They may provide you with further references and code. Most papers are available at the IEEE Xplore web site, or CiteSeer, or Google Schobar. If you'd like to work on an idea on your own, please consult with the course instructor.
I GENERAL VIDEO CODING |
| |
|
| Areas |
Motion-Compensated Video Coding
Motion compensation of video is explained in detail here.
|
| Topics |
Hash Codes for Motion Estimation of Video
Block-based motion-compensation is ubiquitous in modern video coding standards, leading to substancial rate-distortion improvement over simpler methods such as frame differences or conditional replenishment. The following topic aims at designing a low-rate hash code that can be used instead of the original block to perform motion estimation remotely in important applications such as distributed video coding [1].
We follow the convention of using uppercase letters for random variables and lowercase letters for the values they take on, and also non-random symbols. Consider a digital video sequence. The current frame is divided into blocks of the same size. Let the random variable X represent one of those blocks in the current frame, and let Y represent the entire previous frame. Let Z denote the motion-compensated block associated with X. That is, consider shifts of, for instance, one pixel, of the co-located block in the previous frame Y, leading to the minimum squared difference with respect to X. The closest block is Z.
In some applications of video coding, in particular distributed video coding or Wyner-Ziv coding, we need to carry out the motion estimation for X remotely. Precisely, some 'hash code' transformation or statistic T=t(X) is transmitted which can be used instead of X to obtain the motion-compensated block Z' from Y. Y is also available remotely. Think of t(x) as some non-injective map. For instance, a downsampled version of X according to some chess-like pattern. Together with t(x), a method to perform the remote motion estimation to find Z' from T and Y is needed. This motion estimator is denoted as z(t,y). Thus Z'=z(T,Y). Finally, some lossless coding method for T will be required. Note that t(x)=x (identity function) and z(t,y)=argmin_z ||t-z||^2 (z in y) give back the original scenario in which X is known.
Note that T will often require less bits than X, but the motion-compensated block Z' obtained by using T instead of X won't be as close to X as Z, obtained knowing X perfectly. The rate required by this code is R', and a possible measure of distortion is D'=E[||X-Z'||^2], versus R for X and D=E[||X-Z||^2]. Intuitively, R'<=R, and it is clear that D'>=D (You could also use D'-D as a measure of distortion).
An additional example of hash code t(x) is the following. Let M denote the mean value of the pixels in X. T=t(X) could be defined as a block of the same size as X, where each pixel is 0 if <M, and 1 otherwise. This gives some sort of binary pattern that should give reasonable motion-compensated estimates Z', requiring a rate much lower than that of X. Alternatively, t(x) could be the DCT of X, in which each transformed coefficient is quantized with a uniform quantizer of interval width w_ij. An interesting problem would be to find the widths leading to the best RD performance, or equivalently, the optimal rate allocation. An additional example of motion estimator z(t,y) is z(t,y)=argmin_z sum_ij a_ij (t_ij - z_ij)^2, where z is a shifted block inside the previous frame y, and a_ij are weights to be determined.
The objective of this project is to come up with a few transformations t(x), together with motion estimators z(t,y) and lossless coding methods for T, and compare their RD performance. We will propose several possible high-level structures for t(x) and z(t,y), and you'll be asked to figure out the details and find optimal parameters, empirically or heuristically, from a set of test images. Of course, you are encouraged to propose your own transformations and estimators. This is a beautiful, new problem, and very little work has been carried out. You can try and improve the first steps taken in [2].
This is intended to be a combination of empirical work in Matlab or C, conceptual thinking, and clever heuristics, but partial theoretical analyses are always a plus. One of the advantages of this project is the productivity curve. Since we already propose a couple of structures, it shouldn't be too hard to obtain first valuable results, and there is always room for extensions and improvements since you could always find algorithms on your own. Furthermore, no expert background in any specific video coding topic is required, beyond the basics presented in the course. No knowledge of distributed video coding, the main application of this problem, is required at all. A group of 2 to 3 students is recommended.
If interested, please read contact instructions below.
Details
Please read below a list of recommendations for those groups interested on the project on hash-based motion estimation. We’re making these recommendations in order to unify the experiments carried out and the results presented. The notation is as in the topic description in the web page: X=current block, T=hash code=t(X), Z=candidate blocks in the previous frame.
1. BLOCK SIZE. You can use any block size for X and Z, but express your results normalized per pixel to facilitate comparison with other groups. We suggest 8x8.
2. PLOT RD PERFORMANCE. Compare the rate-distortion performance of the hash-based motion estimation methods analyzed, preferably in a single plot.
3. DISTORTION. The vertical axis will represent PSNR (in dB), computed from the average distortion D per pixel in the conventional way.
4. Let Z_opt be the motion-compensated block obtained using the current block X, which minimizes ||X-Z||^2, that is, the MSE between X and Z. Let Z_subopt be the motion-compensated block obtained using the hash T instead of X. The distortion is defined as D_subopt=||X-Z_subopt||^2. Clearly, D_subopt>=D_opt. D_subopt is the distortion you’re asked to plot. You can also draw a horizontal line at the value D_opt for reference.
5. RATE. The rate R should be on the horizontal axis, in bits/pixel, ranging from 0 to 0.15.
6. You may make entropy measurements instead of carrying out actual arithmetic coding, but you need to justify the practicality of your design. We illustrate this with the following example. Suppose that your hash code T is a block of 4x4 pixels drawn from X. Computing the entropy of T wouldn’t in general correspond to a practical lossless coding design because the potential alphabet would have 256^16 elements. In this example, instead, you could do some zig-zag scan, use run-level coding and make entropy measurements on the pairs obtained, just as explained in the lecture on transform coding.
7. TRAINING & TEST SEQUENCES. Use the first 50 frames of the CIF sequences ‘foreman’, ‘bus’, ‘carphone’ and optionally, ‘mobile’, to train and test your system. By training, we mean to optimize any parameters such as quantization widths. You don’t have to use the entire data if your training algorithm is not fast enough. All these sequences are available at http://www.stanford.edu/class/ee398/samples.htm. Provide A SINGLE rate-distortion plot based on your training set (which may contain more than one sequence). If you have time, we would like to ask you to test your method on a sequence NOT used for training (with the parameters optimized for the training sequences). Again, you don’t need to use all the blocks. You can provide a rate-distortion plot for the test sequence in addition to the plot for the training set.
|
| Contacts |
David Rebollo-Monedero (General TA) (adapted from idea suggested by Prof. Girod)
Please email me as soon as possible if you're interested. I'll probably hold a short meeting around or on Friday, Jan 28, to explain the details of the project and help you decide. Good luck!
|
| References |
[1] B. Girod, A. Aaron, S. Rane and D. Rebollo-Monedero, "Distributed Video Coding", in Proc. IEEE, Special Issue on Advances in Video Coding and Delivery, 2003 (invited paper) [PDF]
[2] Shantanu Rane, "Hash-Aided Motion Estimation and Rate Control for Distributed Video Coding,"
EE 392J Digital Video Processing Course Project, Winter 2004 [PDF report]
|
| |
|
| Areas |
Distributed Video Compression, Error-Resilient Video Transmission
See background below.
|
| Topics |
Error-Resilient Video Transmission by Predicting Visibility of Packet Losses
Background
To attain high compression ratios, video coding algorithms rely on predictive coding of the video frames. This involves estimating the motion between the current and the previous frame(s), compensating for this motion, and finding the prediction error between the current and previous frame. The transmitted bitstream then consists of the motion vectors and the lossily compressed prediction error signal. This gives compression, but reduces the error-resilience, i.e., loss of a single frame results in incorrect decoding of the subsequent frames. Error-resilient transmission of this compressed bitstream can be achieved by a number of methods, for example by introducing intra-coded macroblocks, or by explicitly transmitting parity symbols to protect the transmitted bitstream against transmission errors. This latter class of methods, includes conventional techniques like FEC or the recently proposed Systematic Lossy Error Protection (SLEP) scheme which uses Wyner-Ziv Video Coding [1]. In either case, a certain portion of the total transmitted bit-rate is used to transmit parity symbols which can correct for transmission errors, if any. The difference is that in FEC, the errors are corrected exactly, whereas in SLEP, distributed video compression is used to correct the errors up to a small residual distortion. Please see [1,2] for the SLEP principle, implementation details, and advantages of choosing SLEP over FEC.
Idea
This project suggestion is based on the observation that different portions of the video sequence have different susceptibility to transmission errors [3]. For e.g, Loss of the immobile portions of a video scene may be tolerated since these errors do not propagate to subsequent frames. Let the video frame be divided into slices (groups of macroblocks). An interesting question to pose at the encoder, is whether a slice ought to be protected or not. This choice can be made by observing the motion vectors of the macroblocks which make up the slice. For e.g., if the mean motion in the X and Y directions is less than a single pixel and if the variance of the motion vectors is sufficiently low, then the slice represents a low motion area of the video frame, and need not be protected. Can a more sophisticated method be devised to make this decision ? If this is done correctly, the low-motion slices would not be protected, and the bit-rate for error protection would be used only for the medium and high-motion slices. How much improvement in error-resilience can be obtained in this way, as opposed to indiscriminately protecting all slices ?
Possible Application Areas
1. In the systematic lossy error protection (SLEP) context, these questions need to be answered in order to construct a more efficient Wyner-Ziv video codec.
2. On a more general level, a method to ascertain the visibility of slice losses can be used to design improved schemes for error-concealment.
|
| Contacts |
Shantanu Rane
|
| References |
[1] Bernd Girod, Anne Aaron, Shantanu Rane, David Rebollo-Monedero, "Distributed Video Coding" (Proceedings of the IEEE, Special Issue on Advances in Video Coding and Delivery. January 2005.)
[2] S. Rane and B. Girod, "Analysis of Error-Resilient Video Transmission based on Systematic Source-Channel Coding" (Picture Coding Symposium (PCS 2004), Invited Paper, San Francisco, CA, USA)
[3] Amy R. Reibman, Sandeep Kanumuri, Vinay Vaishampayan, and Pamela C. Cosman, "Visibility of individual packet losses in MPEG-2 video", (ICIP 2004, Singapore.)
|
| |
|
| Areas |
H.264 Video Coding
|
| Topics |
Modeling Video Distortion for Scalable H.264
Systems that stream video over unreliable channels are greatly improved by
transmission scheduling and dynamic source pruning. A scheduling and
pruning algorithm examines channel statistics and packet arrival deadlines
to determine which packets to drop and when remaining packets
should be sent, in order to maximize video quality at the client. In
order to make good scheduling decisions, the algorithm needs to model the
decoding distortion that results as a function of which packets arrive in
time to be decoded and which packets are late or are dropped.
While much research has been done on distortion modeling [1,2,3,4,5,6], it
is not known how applicable current models are for the new Scalable H.264
coder proposed in [can't find ref. for the moment, rest assured the
encoder will be provided!]. A suitable term project would be to examine
how applicable past models are for the new Scalable H.264 coder, and then
to develop a new, simple model for distortion as a function of missing
packets for this coder.
I'll help you get started!
|
| Contacts |
Mark Kalman
|
| References |
[1] M. Kalman and B. Girod, "Optimized Transcoding Rate Selection and
Packet Scheduling For Transmitting Multiple Video Streams Over a Shared
Channel," Proc. IEEE International Conference on Image Processing,
ICIP-2005, Genoa, Italy. September. 2005 Submitted. (www.stanford.edu/~mkalman/RESEARCH/research.html)
[2] P. A. Chou and Z. Miao, .Rate-distortion optimized streaming of
packetized media,. Microsoft Research, Tech. Rep. MSR-TR-2001-35,
February 2001, (also submitted to IEEE Transactions on Multimedia).
[3] J. Chakareski, J. Apostolopoulos, S. Wee, W. Tan, and B. Girod,
"Rate-Distortion Hint Tracks for Adaptive Video Streaming," IEEE
Transactions on Circuits and Systems for Video Technology, May 2004
[4] M. Kalman, P. Ramanathan, and B. Girod, .Rate distortion optimized
streaming with multiple deadlines,. in IEEE International Conference
on Image Processing, Barcelona, Spain, September 2003.
[5]Distortion Chains for Predicting the Video Distortion for General Packet
Loss Patterns, Jacob Chakareski, John Apostolopoulos, Wai-tian Tan,
Susie Wee, Bernd Girod, IEEE ICASSP, May 2004.
[6]Analysis of Packet Loss for Compressed Video: Does Burst-Length
Matter?, Yi Liang, John Apostolopoulos, Bernd Girod, submitted to IEEE
Trans on Circuits and Systems for Video Technology, November 2003.
|
| |
|
| Areas |
H.264 Video Coding
|
| Topics |
Performance/Complexity Trade-Offs in H.264 Motion Search
The H.264 video compression standard allows motion compensation using a variety of block sizes and across multiple reference frames. This project investigates the various performance/complexiy tradeoffs possible.
Experiments should be carried out with a set of test sequences by modifying the public-domain reference software for H.264 compression. Reduced-complexity motion search algorithms should be designed and evaluated with regards to their rate-distortion performance. Complexity of each algorithm can be assessed with simplifying assumptions for a single-chip H.264 encoder, taking into account arithmetic/logic operations and off-chip memory access.
The reasonable project result could be, for example, an algorithm with, less than 1/2 the complexity of exhaustive search around predicted vectors using a +/-32x32 window for up to 5 reference frames, with a PSNR difference below 0.2dB on average.
This topic is of great practical relevance. We can arrange meetings with engineers involved in H.264 encoder chip design, if desired.
|
| Contacts |
Prof. Bernd Girod (adapted from topic suggested by Mobilygen)
|
| References |
H. Chung, A. Ortega, A. A. Sawchuk, "Low complexity motion estimation for long term memory motion compensation," Proc. Visual Communications and Image Processing, VCIP 2002, San Jose, cA, January 2002.
V. Lappalainen, A. Hallapuro, T. Hamalainen, "Complexity of optimized H.26L video decoder implementation", IEEE Trans. CSVT, Vol. 13, No. 7, pp. 717-725, July, 2003
M. Horowitz, A. Joch, F. Kossentini, A. Hallapuro, "H.264/AVC baseline profile decoder complexity analysis", IEEE Trans. CSVT, Vol. 13, No. 7, pp. 704-716, July, 2003
M. Gallant, G. Cote, F. Kossentini, "An efficient computation-constrained block-based motion estimation algorithm for low bit rate video coding", IEEE Trans. CSVT, vol. 8, No. 12, pp. 1816-1823, Dec. 1999
|
| |
|
| Areas |
H.264 Video Coding
|
| Topics |
Macroblock-Adaptive Frame/Field Coding in H.264
For interlaced content, the video compression standard
H.264 offers the flexibility of encoding macroblocks in
frame or field modes. The entire frame is encoded as a picture, but individual pairs of vertically adjacent macroblocks can be split into fields for prediction and residual coding. This feature is referred to as Macroblock-Adaptive Frame-Field coding (MBAFF). The MBAFF decision is solely at the encoder's discretion and is outside of the scope of the standard. This project investigates the various performance/complexiy tradeoffs possible.
Experiments should be carried out with a set of interlaced test sequences by modifying the public-domain reference software for H.264 compression. Different MBAFF mode decision algorithms should be designed and evaluated with regards to their rate-distortion performance. Of particular interest is the coding efficiency gain of MBAFF as compared to frame-level frame/field adaptation and encoding of the entire sequence as either frames or fields.
Complexity of each algorithm can be assessed with simplifying assumptions for a single-chip H.264 encoder, taking into account arithmetic/logic operations and off-chip memory access.
This topic is of great practical relevance. We can arrange meetings with engineers involved in H.264 encoder chip design, if desired.
|
| Contacts |
Prof. Bernd Girod (adapted from topic suggested by Mobilygen)
|
| References |
V. Lappalainen, A. Hallapuro, T. Hamalainen, "Complexity of optimized H.26L video decoder implementation", IEEE Trans. CSVT, Vol. 13, No. 7, pp. 717-725, July, 2003
M. Horowitz, A. Joch, F. Kossentini, A. Hallapuro, "H.264/AVC baseline profile decoder complexity analysis", IEEE Trans. CSVT, Vol. 13, No. 7, pp. 704-716, July, 2003
M. Gallant, G. Cote, F. Kossentini, "An efficient computation-constrained block-based motion estimation algorithm for low bit rate video coding", IEEE Trans. CSVT, vol. 8, No. 12, pp. 1816-1823, Dec. 1999
|
II WYNER-ZIV VIDEO CODING |
| |
|
| Areas |
Wyner-Ziv Video Coding
Wyner-Ziv coding, named after [1], consists of lossy source coding with decoder side information. More precisely, the source data X is encoded with a rate constraint, and decoded with a certain distortion, using some side information Y available at the decoder only. Although the values Y takes on are not available at the encoder, the statistical dependence between X and Y is know, and exploited when designing the entire system. Information-theoretical studies suggest that the compression efficiency achieved can be similar to the case in which the side information Y is available at the encoder as well. In fact, in the lossless case, known as Slepian-Wolf coding, the rate in both cases is equal to H(X|Y) [2].
Wyner-Ziv coding – source coding with side information only at the decoder – has been shown to be useful and suitable for certain video applications. In our recent work we applied Wyner-Ziv coding to build an intraframe encoder & interframe decoder system which has a very simple encoder, suitable for low-complexity video applications, such as mobile camera-phones, wireless PC cameras and surveillance cameras. We have also used Wyner-Ziv coding techniques to develop a novel error resiliency scheme for video broadcasting, which outperforms traditional forward error correction schemes and does not require a layered video bitstream for graceful quality degradation. Although this has been an active research area in the last few years, there are still many open problems which need to be solved to make Wyner-Ziv coding more practical for real-world systems.
|
| Topics |
1. Codes for Wyner-Ziv Coding
Channel codes have been shown to work well for source coding with decoder side information. In our current systems, we use a turbo codec as a near lossless Slepian-Wolf codec. Design and study the compression efficiency of other channel codes, especially Low Density Parity Check (LDPC) codes, and investigate their practicality of use for video systems. One important aspect to study is the rate flexibility of the codes for changing source statistics.
2. Rate Control for Wyner-Ziv Coding
For Wyner-Ziv coding scenarios, the rate is dependent on the statistics between the source and the side information. However, the side information is not available at the encoder. Therefore, determining the rate at the encoder is an important issue. Our current rate control assumes feedback from the decoder to the encoder. Investigate better Wyner-Ziv rate control schemes, especially for our current low-complexity video encoder.
3. Other Applications of Wyner-Ziv Coding
We have shown that Wyner-Ziv coding can be used for low-complexity video encoding, error resiliency schemes for video broadcasting and compression for light field images. Can Wyner-Ziv coding be used for other video applications, such as layered video coding, multiple description coding, etc.?
|
| Contacts |
Anne Aaron
|
| References |
[1] A. D. Wyner, J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Inform. Theory, vo. IT-22, Jan. 1976.
[2] J. D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 471–480, Jul. 1973.
[3] B. Girod, A. Aaron, S. Rane and D. Rebollo-Monedero, "Distributed Video Coding", in Proc. IEEE, Special Issue on Advances in Video Coding and Delivery, 2003 (invited paper) [PDF]
|
III WAVELETS |
| |
|
| Areas |
3-D Wavelet Video Coding
Over the years, many researchers have proposed 3-D wavelet coding of video sequences. Thanks to the multi-resolution nature of wavelet transforms as well as efficient embedded coding of the wavelet coefficients, 3-D subband video coding provides great support for scalability, a very desirable feature when transmitting video over the network. However, linear transforms applied in the temporal direction may be inefficient if the motion between frames is not fully exploited. Many attempts have been made to incorporate motion compensation into the 3-D wavelet video coding framework Earlier works are somewhat unsatisfactory in terms of the rate-distortion coding performance because the motion vector field is severely restricted and the temporal transform is usually limited to the two-tap Haar wavelet. Recently, motion-compensated lifting has been proposed, which successfully incorporates unrestricted motion compensation into 3-D wavelet coding and provides compression efficiency approaching the state-of-the-art predictive video coding schemes [1][2][3].
|
| Topics |
Scalable Motion Vectors for 3-D Wavelet Video Coding
In video coding, motion vectors are usually encoded losslessly as side information. The number of bits available for coding the motion vectors directly affects the efficiency of motion compensation, hence significantly influences compression performance. In non-scalable coders, various techniques have been used to optimize the portion of bit-rate spent on motion vectors for a target total bit-rate. However, in scalable video coding, such as 3-D wavelet video coding, the target total bit-rate is unknown during encoding. Therefore, it is desirable to have a scalable coding scheme also for the motion vectors so that the portion of the motion vector bit-rate can be adjusted adaptively according to the target total bit-rate.
Scalable motion coding for 3D wavelet video was first proposed in [4]. A scalable motion representation is achieved by decomposing the motion field, as a two-component image, with 2D wavelet transforms followed by embedded bitplane coding. In the implementation described in [4], the motion representation consists of several quality layers. The boundary of each quality layer provides a truncation point of the bitstream that is optimized to minimize the distortion of the reconstructed motion field for that particular bitstream length, following the EBCOT scheme in JPEG2000. Note that although wavelet transform naturally provides resolution scalability for the motion representation, in [4] motion fields at full spatial resolution (with degraded quality) are still used for video reconstruction at lower texture spatial resolution. This is because the scalable motion bitstream is organized and optimized to be progressive in quality, rather than in resolution. In other words, the scalable motion coding scheme
dessribed in [4] was intended for quality-scalability, rather than resolution-scalability, despite of the fact that wavelet transform is applied on the motion fields. For instance, directly taking the spatial low-pass subband (spatially averaged and subsampled) of the motion field estimated at the full resolution may not be optimal to reconstruct the lower resolution video.
Another problem of the work in [4] is that the motion model it adopted leads to motion fields lying on a regular grid. Therefore, 2D wavelet transforms can be directly applied. For motion models with variable block size, for instance, which has been shown to benefit motion compensation, the wavelet motion coding mothod proposed in [4] no longer works. Several works [5][6][7][8] were presented in ICIP 2004 to achieve scalable motion coding for variable-block-size motion compensation. They all share a general structure of having a multi-layer motion representation. The base-layer is coded using conventional AVC-like motion coding methods. The enhancement layers are predictively coded using information from previous layers.
In particular, [5] proposed to apply a separate rate-constrained motion estimation for each motion layer, working at a certain spatial resolution of the video and with a certain rate-distortion tradeoff (lambda in rate-constrained ME), using the estimated motion from previous layers as a starting point. As a result, each motion layer is optimized for a certain resolution/quality operation point of the video. The authors in [5] also observed that the video distortion results from the motion distortion is nearly constant in terms of MSE at a wide rage of texture bit-rate (However, this conflicts with Fig. 5 and 6 in [4], hence needs further investigation). Therefore, the optimial number of motion layers to be included at a certain video rate (optimal motion/texture trade-off) can be easily obtained.
In this project, students are encouraged to compare the various scalable motion vector estimation and coding schemes previously proposed and possibly come up with a novel algorithm. Having a scalable motion vector coder, optimal bit allocation between the motion vector and the texture for 3-D wavelet video coding can also be investigated.
|
| Contacts |
Chuo-Ling Chang and Sangeun Han
|
| References |
[1] A. Secker and D. Taubman, "Motion-compensated highly scalable video compression using an adaptive 3D wavelet transform based on lifting,'', in Proc. IEEE Int. Conf. on Image Processing 2001, Thessaloniki, Greece, Oct. 2001, vol. 2, pp. 1029-1032.
[2] B. Pesquet-Popescu and V. Bottreau, "Three dimensional lifting schemes for motion compensated video compression,'' in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing 2001, Salt Lake City, UT, USA, May 2001, vol. 3, pp. 1793-1796.
[3] L. Luo, J. Li, S. Li, et al., "Motion compensated lifting wavelet and its application in video coding,'' in Proc. IEEE Int. Conf. on Multimedia and Expo 2001, Tokyo, Japan, Aug. 2001, pp. 481-484.
[4] A. Secker and D. Taubman, "Highly Scalable Video Compression with Scalable Motion Coding", Trans. on Image Processing, Aug 2004
[5] R. Xiong, J. Xu, F. Wu, S. Li, Y.-Q. Zhang, "Layered Motion Estimation and Coding for Fully Scalable 3D Wavelet Video Coding", ICIP 2004
[6] J. Barbarien et al., "Scalable Motion Vector Coding", ICIP 2004
[7] G. Boisson, E. Francois, C. Guillemot, "Accuracy-Scalable Motion Coding for Efficient Scalable Video Compression", ICIP 2004
[8] C.-Y. Tsai et al., "Enhanced Motion Estimation for Interframe Wavelet Video Coding", ICIP 2004
|
IV Video Streaming |
| |
|
| Areas |
Video Streaming
|
| Topics |
Low complexity scheduler for video streaming
In a video streaming application, a scheduler periodically selects the
set of packets that should be sent to a receiver. In the simplest case,
this can be done sequentially and pictures are transmitted
chronologically. This is not always the best solution, especially in
bandwidth limited or lossy environments. In the first case, the network
does not have enough bandwidth to accomodate all the packets of a video
clip, hence, some packets need to be dropped or a lower quality should
be chosen. In the second case, because of low latency requirements,
retransmission are not always possible or efficient. One of the
scheduler task is to control potential retransmissions.
Recently, a theoretical framework was setup for analyzing this packet
scheduling problem formally [1]. Optimized schedulers were proposed
which try to minimize some Lagrangian cost of video distortion and rate
[2] or of some other network cost [3]. Although these schedulers were
shown to achieve high performance, their complexity is still too high.
In addition, several assumptions are necessary for determining optimized
transmission schedules, as the search space of all possible schedules
becomes exponentially large with the number of packets considered.
In this project, your task will be to analyze a typical transmission
scenario and design a low-complexity scheduler which adaptively decides
which packets should be transmitted over the network. Your design will
rely on rules which you will draft from formal distortion analysis,
common sense or observation! You will be provided with an implementation
of the scheduler proposed in [3], operating on a wireless ad hoc network
simulated in NS-2. Your goal will be to design a scheduler which
outperforms the complex scheduler in terms of complexity and of
performance. You will then collect experimental results and present your
findings.
|
| Contacts |
Eric Setton
|
| References |
[1] P. A. Chou and Z. Miao, "Rate-distortion optimized streaming of
packetized media," Microsoft Research Technical Report MSR-TR-2001-35,
February 2001. http://research.microsoft.com/~pachou/pubs/ChouM01tr.ps
[2] M. Kalman, and B. Girod, "Rate-Distortion Optimized Streaming of
Video With Multiple Independent Encodings," Proc. IEEE International
Conference on Image Processing, ICIP-2004, Singapore, October. 2004.
[3] E. Setton and B. Girod, "Congestion-distortion optimized scheduling
of video over a bottleneck link," Proceedings MMSP 2004, pp 99-102,
Siena, Italy, October 2004.
|
| |
|
|