EE368C Project
Winter 2001
Peter Chou and Prashant Ramanathan
A light field [1] is a 4D representation of radiance in free space that can be used to render a static 3D scene from arbitrary views. Light field data sets are constructed by sampling many 2D images of a scene taken from multiple viewpoints. Data compression is essential for making light fields tractable for rendering, transmission, and storage [2].
It has been shown that representations combining light field and 3D geometry data yield benefits for compression [3]. In this project, we consider the problem of compressing light fields with known 3D geometry. We first briefly review two previous approaches to this problem: model-based coding [3] and surface light fields [4]. We will show how they are related and propose a method that combines ideas from both.
In the model-based approach, Magnor and Girod [3] use 3D geometry that is constructed from the light field, which consists of a set of images. A texture map is generated for each light field image by projecting the image onto the geometry model. The set of images are parameterized in a 2D array corresponding to the set of viewing directions. This 2D set of 2D texture maps is compressed using a 4D wavelet coder.
Wood et al. [4] parameterize the light field on the surface of the object, thereby creating a surface light field. For each point on the surface of the model, a lumisphere is used to represent the radiance in all directions from that surface point. The colors on the lumisphere represent the color of the object surface point as seen from a particular viewing direction. The lumisphere is interpolated and faired to obtain a continuous function representing all viewing directions. Two methods of compressing the lumispheres are proposed. The first is function quantization, which is a generalization of vector quantization. The second is principal function analysis, which is based on principal component analysis.
These two representations are essentially transposed parameterizations of the same 4D light field data (see Figure 1). With the texture map based case, there is one texture map for each view, and different points within a texture map correspond to different points on the surface of the geometry. In Wood's surface light field representation, each lumisphere corresponds to different points on the object surface, and points within a lumisphere represent the different views.


Therefore, the multi-view texture map and surface light field representations are based on the same underlying light field data. This allows us to combine the advantages of both approaches. With texture maps, we can exploit spatial coherency using a 4D wavelet coder. With surface light fields, the diffuse and specular components are coded separately, allowing for a reparameterization that improves coherency between lumispheres.
We can establish a correspondence between pixels in the texture map and points on the surface of the model (see Figure 2). This allows us to explicitly relate the two representations. This also allows us to examine whether the texture maps used in model-based coding adequately sample the surface of the model.

We compute the transformation that maps points in 2D texture map space to 3D model space for each triangle in the model as follows. Let the texture map coordinates of the vertices of a triangle be (s0, t0), (s1, t1), and (s2, t2), and let these corresponding vertices in model space be (x0, y0, z0), (x1, y1, z1), (x2, y2, z2). We seek A, where,

Solving for A,

We applied a 2D triangle rasterization algorithm to determine the texture map pixels belonging to each triangle. Transforming the coordinates of each pixel results in a set of 3D points distributed on the surface of the object. Figure 3 below shows the distribution of the 65,536 surface points generated from a 256x256 texture map for the Garfield model. For this model, a texture map resolution of 256x256 results in a dense sampling of the surface.

Using our computed set of surface points, we can generate texture maps in an alternative way than the warp method used in [3]. For each surface point, we first generate an image which we call a view map. A view map is analogous to a lumisphere [4] in that it represents the different views for a single surface point. The difference is that lumispheres are a continuous parameterization of viewing directions, and view maps are a discrete representation, where each pixel of the view map corresponds to a different view of an object.
For each surface point, we generate its view map by tracing rays from the surface point to the camera position of each view (see Figure 4). The color of the ray is computed by bilinearly interpolating the pixels in the image plane of the view. The corresponding pixel in the view map is set to the ray color if the surface point can be seen in that view. For each ray, we check for visibility by comparing the distance of the surface point from the image plane with depth maps obtained by rendering the entire model from each camera position. This culls surface points that are occluded by the object.

Since each surface point corresponds to a texture map pixel, a set of texture maps is obtained by transposing the set of view maps. The resulting texture maps produced by this method are very similar to those produced by the warp method, except that this method produces texture maps with fewer missing pixels. For example, Figure 5 below shows a texture map generated by both methods for comparison.
![]() |
![]() |
| (a) Warp method | (b) Our method |
A few missing pixels can still be seen in Figure 5(b), which we believe are a result of round-off errors in visibility determination. The surrounding black areas represent the non-visible regions of the surface.
For the Garfield data set, we obtain the following rate-distortion curves, as shown in Figure 6. The warp method actually results in slightly better rate-distortion performance because the holes represent "don't-care" regions that can be set to values that minimize bit-rate for the wavelet coder. Since these pixels are not used to reconstruct the images used for computing distortion, distortion is not adversely affected by the holes. However, we believe that the holes may result in lower quality when rendering for a novel viewing direction. The difference between the curves, however, is very small. This indicates that both the warp method and our proposed surface point based texture generation procedures are equivalent.

Representing the light field as view maps allows us to consider new parameterizations as described by Wood [4]. Since we can transpose the view maps into texture maps, we can apply the 4D wavelet coder on a reparameterized view map data set.
The first step of the reparameterization is to treat the diffuse and specular components of the reflectance separately. The diffuse component, which corresponds to the Lambertian nature of the surface, looks similar from all directions and can be approximated by a single color value. On the other hand, the specular component, which arises from the mirror-like property of the surface, will appear different from different points of view.
We estimate the diffuse component for each surface point with the median color of the point over the different views, as proposed by Wood et al. [4]. The median is used rather than the mean because it is stated to be robust against outliers and result in a more accurate representation of the bulk of the data.
This diffuse component is computed for all surface points, resulting in a median texture map for the entire light field data set (shown in Figure 7). This texture map is coded with a standard image compression technique such as JPEG. For this project, we assume that the bits required to represent the median texture map are negligible relative to the overall bit-rate.

We subtract the median colors to obtain median-subtracted view maps that represent the specular component. As described in [4], we can do a simple reflection about the surface normal to improve the coherency of the view maps. Figure 8 shows the behavior of a mirror-like surface. In this situation, light originating from a particular direction will be reflected about the surface normal into different views. By parameterizing the view map in terms of light source direction rather than view direction, we can increase coherency between the view maps. This is done by reflecting the viewing direction about the surface normal. For a normal n and view direction w, the reflected direction w' is given by:

Each pixel in the view map represents a discrete viewing direction. In order to implement the reflection of view maps, a discretization of the entire space of viewing directions is required. We represent the hemisphere of directions in spherical coordinates by two angles, phi and theta. In our experiments, we have 256 views represented as 16x16 view maps. In order not to increase the image resolution of the reflected view maps, we quantize each of our angles to one of 16 possible values. The reflected direction, as calculated by the equation above, falls into one of 256 viewing direction bins. In each bin, the color of the corresponding pixel from the median-subtracted view map is stored. If multiple views fall into the same bin when reflected, their respective values are averaged.
To reconstruct the original view maps from the reflected view maps, we reflect each viewing direction as above to determine the corresponding bin in the reflected view map. The pixel value for the original view map is set to the color value of the bin. Note that since several pixels in the original view may map to the same bin in the reflected view map, loss in quality may occur simply due to reflection and unreflection, without the additional coding of the view maps.
We evaluate the compression efficiency of using view map reparameterization for the Garfield data set. For comparison, we use the standard model-based coder as the baseline. We perform the reparameterization as explained in the previous section, and code the reflected view maps using a 4D wavelet coder.
For the Garfield data set, we obtain the rate-distortion curve shown in Figure 9. We observe that at high bit-rates, there is a decrease in image quality of approximately 2 dB for the reparameterization method. As the bit-rate lowers, this gap in image quality closes. In fact, for the lowest bit-rate (0.0015 bits per pixel), we observe that the reparametrization performs slightly better than the original approach.

We offer several reasons why the reparameterization approach does poorly compared to the original. First, the geometry used to encode the light field is approximate. Therefore, the median-subtracted view maps capture not only the specular reflectance, but the view-dependence caused by using approximate geometry. Second, the mapping when reflecting and unreflecting adversely affects the image quality. We observed that many views map into the same bin upon reflection. The final reason is that the Garfield data set is not a true light field data set. It was captured by moving the object and keeping the camera fixed, instead of moving the camera and keeping the object fixed.
Surface light fields and model-based coding are dual representations of the same data. We have established a correspondence between pixels in a texture map and points on the surface of the model. Using this correspondence, we are able to generate texture maps that can be used in the model-based coding scheme by constructing and transposing a view map representation for each surface point. The texture maps we generate are very similar to the ones generated with the model-based method. Also, by visualizing these surface points, we showed that the 256x256 texture maps used in the model-based scheme provide a dense sampling of the surface.
The view map representation also lends itself to reparameterization as done in surface light fields. Median removal and reflection can be used to separate diffuse and specular components. The reflected view maps can be encoded by a 4D wavelet coder. Although the reported results are less than encouraging, we have outlined several reasons why we believe this approach should be further investigated.